gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
GNU Affero General Public License v3.0
602 stars 63 forks source link

Unable to download a particular website using cli #18

Closed sdht0 closed 1 year ago

sdht0 commented 1 year ago

Hi. I get the following error on seemingly any web page from "http://highscalability.com" when using single-file-cli. However the firefox extension works as expected. Can you investigate? Thanks.

$ single-file --browser-executable-path chromium "http://highscalability.com/blog/2022/12/16/what-is-cloud-computing-according-to-chatgpt.html"
Evaluation failed: TypeError: Cannot read properties of undefined (reading 'type')
    at Object.node (<anonymous>:1:55192)
    at Object.node (<anonymous>:1:56252)
    at Object.Hi (<anonymous>:1:97189)
    at Object.node (<anonymous>:1:55258)
    at Object.node (<anonymous>:1:56252)
    at Fr.forEach (<anonymous>:1:44421)
    at Object.ln [as children] (<anonymous>:1:54840)
    at Object.Cc (<anonymous>:1:116289)
    at Object.node (<anonymous>:1:55258)
    at Object.generate (<anonymous>:1:56320) URL: http://highscalability.com/blog/2022/12/16/what-is-cloud-computing-according-to-chatgpt.html
Stack: Error: Evaluation failed: TypeError: Cannot read properties of undefined (reading 'type')
    at Object.node (<anonymous>:1:55192)
    at Object.node (<anonymous>:1:56252)
    at Object.Hi (<anonymous>:1:97189)
    at Object.node (<anonymous>:1:55258)
    at Object.node (<anonymous>:1:56252)
    at Fr.forEach (<anonymous>:1:44421)
    at Object.ln [as children] (<anonymous>:1:54840)
    at Object.Cc (<anonymous>:1:116289)
    at Object.node (<anonymous>:1:55258)
    at Object.generate (<anonymous>:1:56320)
    at ExecutionContext._ExecutionContext_evaluate (<path>/single-file-cli/node_modules/puppeteer-core/lib/cjs/puppeteer/common/ExecutionContext.js:229:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async ExecutionContext.evaluate (<path>/single-file-cli/node_modules/puppeteer-core/lib/cjs/puppeteer/common/ExecutionContext.js:107:16)
    at async getPageData (<path>/single-file-cli/back-ends/puppeteer.js:150:10)
    at async exports.getPageData (<path>/single-file-cli/back-ends/puppeteer.js:56:10)
    at async capturePage (<path>/single-file-cli/single-file-cli-api.js:254:20)
    at async runNextTask (<path>/single-file-cli/single-file-cli-api.js:175:20)
    at async Promise.all (index 0)
    at async capture (<path>/single-file-cli/single-file-cli-api.js:126:2)
    at async run (<path>/single-file-cli/single-file:54:2)
gildas-lormeau commented 1 year ago

Thank you, I was able to reproduce the issue. Unfortunately, it's a bit harder than usual to debug. I suspect this page overrides standard APIs with non-compliant implementations.

Edit: For the record, the error is thrown when SingleFile generates stylesheet contents.

melyux commented 1 year ago

Just got the same issue with many links on the Web Archive:

Evaluation failed: TypeError: Cannot read properties of undefined (reading 'dataset')

gildas-lormeau commented 1 year ago

@melyux Can you give me an example of link that would allow me to reproduce your issue?

gildas-lormeau commented 1 year ago

@sdht0 I did some investigation on this particular issue and found the problem. It's due to a script on the page which overrides the native method Function#bind with a non-compliant implementation. Hopefully, this is a practice that is considered as bad for at least 10 years. So, this should me more and more rare. Meanwhile, I added an option --browser-freeze-prototypes to freeze the native methods and circumvent this kind of problem.

readstead commented 1 year ago

@gildas-lormeau I had a similar thing with Web Archive, this is an example of a page where SingleFile CLI threw this error.

sdht0 commented 1 year ago

Thanks a lot @gildas-lormeau! Meanwhile I had sidestepped the issue by simply downloading the whole website using wget ;). Their print view is pretty handy (e.g., http://highscalability.com/blog/2023/7/16/lessons-learned-running-presto-at-meta-scale.html?printerFriendly=true).

gildas-lormeau commented 1 year ago

@readstead Thank you for the additional info, I was able to reproduce and fix the issue in the last version I just published.

@sdht0 You did the right thing ;) And it's true that the page intended for printing is perfect!

melyux commented 1 year ago

Just tried and it's working great now. You are the man

gildas-lormeau commented 1 year ago

@melyux Thank you :)