gildas-lormeau / SingleFileZ

Web Extension to save a faithful copy of an entire web page in a self-extracting ZIP file
GNU Affero General Public License v3.0
1.82k stars 140 forks source link

Differences to Chrome's (new?) MHTML page capture #134

Closed rmst closed 2 years ago

rmst commented 2 years ago

I was curious, is SingleFile(Z) doing anything differently than Chrome's https://developer.chrome.com/docs/extensions/reference/pageCapture/?

For Chromium-based browsers, could SingleFile(Z) be simplified by getting the pageCapture MHTML output and converting that into a regular HTML file? MHTML seems to be a very simple format.

gildas-lormeau commented 2 years ago

Today, MHTML is almost a proprietary format from my point of view (see https://docs.google.com/document/d/1FvmYUC0S0BkdkR7wZsg0hLdKc_qjGnGahBwwa0CdnHE). This is a big drawback for an archiving format. The difference is that SingleFile/SingleFileZ rely on standard formats which are more perennial, i.e. HTML and ZIP respectively. Note also that SingleFile existed before Google decided to add the support of MHTML in Chrome.

rmst commented 2 years ago

Thanks for the nice reference! Agreed, that makes sense. On the other hand it's a pretty simple, human-readable format which, unlike the html/zip approach, doesn't require modifying the urls on the page.

It probably also has the unfair advantage that it's implemented natively and the browser doesn't have to re-download the files

rmst commented 2 years ago

Note also that SingleFile existed before Google decided to add the support of MHTML in Chrome

Who knows perhaps it was Singlefile that finally pushed them to implement it / turn MHTML on by default :P

gildas-lormeau commented 2 years ago

What I can tell you is that Google is more or less considering the opposite today, see https://crbug.com/1235248. Note that neither Firefox nor Safari currently support MHTML and that it's very unlikely that this will happen.

rmst commented 2 years ago

Ah, interesting!