danny0838 / firefox-scrapbook

ScrapBook X – a legacy Firefox add-on that captures web pages to local device for future retrieval, organization, annotation, and edit.
Mozilla Public License 2.0
323 stars 65 forks source link

Wikipedia (SVG/TeX-math-mode) formulas not rendered correctly in saved scrapbook entry (same problem in the original Scrapbook addon) #96

Closed KIAaze closed 8 years ago

KIAaze commented 8 years ago

Any formulas in Wikipedia using the LaTeX syntax (<math>...</math>) do not get rendered correctly in saved scrapbook entries.

When saving the following page for example, a lot of "formula images" end up as broken links: https://en.wikipedia.org/wiki/Shor%27s_algorithm

It is possible to still view the formulas through "right-click->View image", but this is not very practical of course.

Formulas using the "HTML syntax" ({{math|...}}) are rendered ok.

The original Scrapbook addon has the same issue.

A workaround to the problem is to set the wikipedia rendering preferences to PNG images. cf https://en.wikipedia.org/wiki/Help:Displaying_a_formula

So it looks like it might be some sort of included SVG rendering issue.

danny0838 commented 8 years ago

Thank you for reporting. It seems that browsers must recognize .svg extension to correctly load svg images. Unfortunately, due to some historical problems, ScrapBook cannot recognize the file name or file type from the HTTP header (#90), and thus loses the file extension for dynamic file names (i.e. the file name is in the Content-Disposition field of HTTP header). It is fixable but requires a rather large change of the code framework. We'll probably include this in ScrapBook 2.0.

pascallothar commented 8 years ago

I have found a work-around. Use "Save as..." to save the page temporarily on your computer. Then, open it with Firefox. If the rendering is correct *, you can now capture the page with Scrapbook X. Now, right click on the newly captured item and, in "properties", edit the URL so it will indicate the actual URL of the page and not the location of the file you temporarily saved on your computer. Done.

So, Danny, would it be difficult to implement that in your add-on? Because, the whole work-around takes a lot of time for each Wikipedia capture! You could, for example, add an entry (say, "LATEX bad rendering") in one of the menus that would implement a "macro?", script or ???, triggering the MAF add-on (if installed). Or, maybe better, use some of the code in the MAF add-on?

On the other hand, when I want to save a .maff for a page needing script saved (for example pages of Firefox add-ons with image pop-up of screenshots or expand/collapse arrows), I don't use the MAF add-on (which truncate the scripts, making it impossible to save the "pop-up mechanism"), but, thanks to Danny, I "capture as ..." the page with Scrapbook X with "scripts" and "images" check-boxes checked and then "create MAF".

Saying that, I have to mention that, when you "save as ..." or capture a Wikipedia page, you need to expand or collapse the parts of the page you want to save or not save, because none of the saving methods will allow you to expand/collapse the text after saving. Could someone explain me why it is not working on Wikipedia pages like on Firefox add-ons pages (where saving the scripts work)?

And, one time more, thank you, Danny! ( @danny0838 )

pascallothar commented 8 years ago

Hello @danny0838 ,

I have seen that you put labels on the threads (concerning issues) that I began, but not on my last comment in the present thread. So iI was not sure that I was using GitHub very well and that you saw my comment or not. So I have tried to find answers in the help. There is a lot to read in the help, but if I have well understood, I should have used @danny0838 to trigger the system to send you a notification. Is that right?

Have a nice day, Pascal

danny0838 commented 8 years ago

Yes, and no.

Since I am watching my repo, I will receive any new post or comment in the issue, and thus you don't need to explicitly add a name tag in order to notify me (but it would be useful to mark whom exactly you are responding to if there are many participants).

You can see the options about notification in the GitHub settings for more detail.

danny0838 commented 8 years ago

@pascallothar As for the capture script issue you mentioned, it's because javascript is sometimes highly context dependent. Take Wikipedia javascript for example, some scripts are to load scripts from "the address of the current visiting url with some specific fixes", and thus it won't work after captured since the "current visiting url" is changed and the scripts resided in "the address of the current visiting url with some specific fixes" has never been downloaded.

There is no computer operable rule to detect whether a javascript is like this or not, and thus captured scripts are not guaranteed to work properly.

danny0838 commented 8 years ago

ScrapBook 1.13.0b should have fixed this issue. You are free to have a test on it.

danny0838 commented 8 years ago

Resolved the issue in ScrapBook 1.13.*

pascallothar commented 8 years ago

@danny0838 , Yes, the issue is fixed.

Actually, for the Wikipedia page mentionned above [https://en.wikipedia.org/wiki/Shor%27s_algorithm], the rendering is the same as with [Mozilla Archive Format, with MHT and Faithful Save https://addons.mozilla.org/fr/firefox/addon/mozilla-archive-format/]. But ScrapBook X capture very well the page [https://github.com/danny0838/firefox-scrapbook], where [Mozilla Archive Format, with MHT and Faithful Save] give a very bad rendering. Ten points more for you :-)

I was looking inside the code given by "The Inspector", and I saw that in the <body>, everything is there!!! The MathML version of a math formula, the Latex version and the .svg version. Since Firefox is able to render the MathML, why is it not able to render it from the MathML tags inside the html code captured by ScrapbookX? It seams that something in the <head> is not set to make Firefox use the MathML tags of the <body>. I understand maths and physics, but my understanding of informatics is something like C/C++ for Dummies and Bash for Dummies. So, I can not understand more about what the hell prevent Firefox to render what is already written in the html code of this page??? But I am sure you have an answer.

[https://github.com/danny0838/firefox-scrapbook]

Original:

fireshot original

Captured by ScrapBookX:

fireshot sb x

Captured by Mozilla Archive Format, with MHT and Faithful Save:

fireshot maf

[https://en.wikipedia.org/wiki/Shor%27s_algorithm]

Original:

fireshot wikipedia original

Captured by ScrapBookX:

fireshot wikipedia sb x

Captured by Mozilla Archive Format, with MHT and Faithful Save:

fireshot wikipedia maf