gildas-lormeau / SingleFile

Web Extension for saving a faithful copy of a complete web page in a single HTML file
GNU Affero General Public License v3.0
14.8k stars 971 forks source link

SingleFile CLI not loading content below the fold of Reddit thread #644

Closed Ken0sis closed 3 years ago

Ken0sis commented 3 years ago

Describe the bug When using the CLI to save Reddit page, it produces a different result compared to if I use the Chrome extension. The one produced by Chrome extension would save the entire Reddit thread, but the one saved by CLI does not display content below the button "View Entire Discussion". In both cases, I'm able to see the entire discussion thread when I submit the save command.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://www.reddit.com/r/algotrading/comments/ig1cd5/alpaca_vs_interactive_brokers_for_python/
  2. Click on 'View Entire Discussion'
  3. Save using CLI with command: 'single-file https://www.reddit.com/r/algotrading/comments/ig1cd5/alpaca_vs_interactive_brokers_for_python/ --filename-template="hello-{page-title}.html" --browser-executable-path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"'
  4. Output does not show entire discussion thread. It will show the button "View Entire Discussion" and contents beneath that are not visible.

Expected behavior The outputs from CLI should be the same as output using Chrome extension. We should be able to read the full Reddit thread.

Screenshots If applicable, add screenshots to help explain your problem.

Environment

Additional context This is a problem when using the CLI

Ken0sis commented 3 years ago

Seems to work as expected if I don't run in headless mode, and click on button to expand the whole thread before SingleFile finishes the capture.

gildas-lormeau commented 3 years ago

You can automate this by running the script below before the page is saved, see https://github.com/gildas-lormeau/SingleFile/wiki/How-to-execute-a-user-script-before-a-page-is-saved for more info. When running SingleFile CLI, you have to use the option --browser-script to specify the path of the script.

dispatchEvent(new CustomEvent("single-file-user-script-init"));
addEventListener("single-file-on-before-capture-request", () => {
  const button = Array.from(document.querySelectorAll("button"))
   .find(button => button.textContent.startsWith("View Entire Discussion"));
  if (button) {
    button.click();
  }
});
gildas-lormeau commented 3 years ago

@Ken0sis Did you try the solution I proposed?

Ken0sis commented 3 years ago

@gildas-lormeau I apologize for not trying out the proposed solution earlier, because I was pretty sure it would solve the problem. Unfortunately, this proposed solution did not work when I tried it on the Reddit thread. I was hoping it might work to close those "accept cookies" popup, but it doesn't work on those either.

I don't have further insight on why the solution does not work, since it seems very reasonable. Do you have other ideas, Gildas? Thank you for looking this, because it would help put the CLI more on equal footing with extension.

gildas-lormeau commented 3 years ago

I wasn't aware the cookie consent popup was an issue. That's why the script did not remove it. I also noticed the page was dark because of a notification request. I updated the script to disable this feature.

Notification.requestPermission = async () => "denied";
dispatchEvent(new CustomEvent("single-file-user-script-init"));
addEventListener("single-file-on-before-capture-request", () => {
    document.getElementById("POPUP_CONTAINER").parentElement.remove();
    const button = Array.from(document.querySelectorAll("button"))
        .find(button => button.textContent.startsWith("View Entire Discussion"));
    if (button) {
        button.click();
    }
});

I'm also closing this issue because it's not a bug in SingleFile.

Ken0sis commented 3 years ago

@gildas-lormeau Thanks. Unfortunately, the new script you provide hasn't resolve the issue. This does not seem to be a bug for SingleFile CLI, and it might be a "feature" in the sense that Single CLI is not intended to automatically unfold the thread before a capture, but wouldn't we would want to capture a whole discussion thread and not just the top part? Ideally, we want a user experience for the CLI that's similar to the one from extension, right?

Should I repost this under "feature request" rather than bug? I don't want to clutter up the bug report section. Or maybe the answer is I should fix it myself, because it's not in the scope for SingleFile CLI? Thanks.

gildas-lormeau commented 3 years ago

@Ken0sis The script I posted works OK on my machine with the example you posted. This issue can be solved, I guess, but I need to know more details about the problem you have with the last version of the script since I cannot reproduce it.

Ken0sis commented 3 years ago

@gildas-lormeau Thank you for trying to help. I think I've found the problem. I was running CLI with the option --save-raw-page=true and somehow that didn't run the script. To be honest, I don't know how the --browser-script option works, so I didn't think saving raw would make a difference. Thanks for again for help. I'll try the same solution and see if I can get rid of those annoying cookie-accept popups.