gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
14.29k stars 943 forks source link

ScienceDirect snapshots are missing styles #1484

Closed adomasven closed 1 week ago

adomasven commented 1 week ago

Hi Gildas,

We are getting reports from our users that saving ScienceDirect pages (e.g. https://www.sciencedirect.com/science/article/abs/pii/S030193222100152X) produces a broken snapshot. I can reproduce it with the default SingleFile extension. It is reproducible on Firefox and Chrome, with Zotero Connector disabled. Zotero uses the option to not capture frames on the page, but that doesn't seem to have an impact when toggled in SingleFile options.

gildas-lormeau commented 1 week ago

Thank you. I was able to reproduce and fix the issue in SingleFile. The fix will be available in the next version.

You will have to adapt the code on your end. The fetch function passed as parameter here https://github.com/zotero/zotero-connectors/blob/d3b870f525137c79a567398b668b87ee19d60fb7/src/common/singlefile.js#L60 should try to fetch the resource via a script injected into the page (i.e. hostFetch in SingleFile, window.wrappedJSObject.fetch might work in Firefox). This script has to fetch the resource from the page itself and outside the extension world. This allows keeping the referrer and origin HTTP header values unchanged (this is the cause of the issue).

gildas-lormeau commented 1 week ago

Actually simply passing the option referrerPolicy: "strict-origin-when-cross-origin" to fetch might be sufficient. FYI, the latest version of single-file-core (v1.5.2) adds this option automatically when calling fetch.

adomasven commented 1 week ago

I've updated SF in the Connector to the latest version which uses referrerPolicy: "strict-origin-when-cross-origin", but it did not help neither in Firefox, nor in Chrome. Are you sure it works without sending the correct referrer?

gildas-lormeau commented 1 week ago

Yes, I'm sure it works but for that I have to call fetch outside the content script, in the page itself. You should just have to adapt the code below.

https://github.com/gildas-lormeau/SingleFile/blob/dc9d100fbce9e4870d24aa04f718f15c138ca652/src/lib/single-file/fetch/content/content-fetch.js#L96-L138

This code interacts with the following code in single-file-core.

https://github.com/gildas-lormeau/single-file-core/blob/3921d35301b2904025abbc8705f87e1279d4f52c/processors/hooks/content/content-hooks-frames-web.js#L136-L148.