gildas-lormeau / SingleFileZ

Web Extension to save a faithful copy of an entire web page in a self-extracting ZIP file
GNU Affero General Public License v3.0
1.83k stars 138 forks source link

Unclear what "Save raw page" does #88

Closed rhn closed 3 years ago

rhn commented 3 years ago

This is something I realized while debugging https://github.com/gildas-lormeau/SingleFileZ/issues/87 : I have no idea whether the "save raw page" does what I want, or not.

In the general space of interpreting JS before saving, I'm seeing three general possibilities:

  1. "Save raw page" as presented in the HTTP request, AFAIK this is what "view source" shows.
  2. "Interpret JS and save raw page" as rendered by Firefox after interpreting the necessary JS, this is the DOM tree from Firefox Inspector (F12).
  3. "Interpret JS again" by the extension, by fetching the raw HTTP contents, building another DOM by interpreting some JS, and then saving that.

For me, only option 2. makes sense: I want the saved page to be searchable offline (using recoll), and to look the same way I saw it in the browser. For that, I disable scripts using uMatrix, and ask SingleFileZ to drop any scripts too. On top of that, I have a custom stylesheet added by Stylish to fix what was broken by missing JS.

But the explanation for "save raw page" is ambiguous. Turned on, it could mean either of the first two, and turned off it could mean any of the last two.

How is JS interpreted when "save raw page" is unchecked? It would be great if the explanation could be made more specific. If it's difficult, I can take in a longer explanation and try to condense it to something more accepable.

gildas-lormeau commented 3 years ago

It's actually written in the help page (see below). The page saved is indeed the one displayed when you view the HTML source (option 1).

Check this option to save the page without interpreting JavaScript. Checking this option may alter the document.
gildas-lormeau commented 3 years ago
rhn commented 3 years ago

Thanks. I was referring to that text indeed, when I meant that it's not clear (it was not selectable, or I'd have copied it). Could we come up with a better description? Here's my proposal:

This controls which version of the document will be used as the base for saving. When unchecked, it's the version as you see, rendered by your browser. When it's checked, it's the raw file that your browser received.

Is this about correct? I described it as if the current DOM tree was getting saved (so open UI elements would be saved open), but I'm not actually sure if that's true (rendering again would be more like 3., right?).

gildas-lormeau commented 3 years ago

I updated the description. I had to adapt your suggestion to follow the way options are described in the options page.