gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
GNU Affero General Public License v3.0
540 stars 58 forks source link

--browser-script after --browser-executable-path results in silent failure #30

Open ivan opened 1 year ago

ivan commented 1 year ago

This works:

# ./single-file --browser-script remove-images.js --browser-executable-path chromium "https://www.google.com/"

But this fails, exiting silently with status code 0:

# ./single-file --browser-executable-path chromium --browser-script remove-images.js "https://www.google.com/"

I just saw that this was already reported in https://github.com/gildas-lormeau/single-file-cli/issues/8#issuecomment-1233639675 but I wanted to make this more visible, because it can take a while to find the cause.

I first observed this on NixOS / google-chrome-unstable and reproduced it on Debian unstable / chromium:

# uname -a
Linux inspired-ocelot 6.1.24 #1-NixOS SMP PREEMPT_DYNAMIC Thu Apr 13 14:55:40 UTC 2023 x86_64 GNU/Linux

# node --version
v18.13.0

# npm --version
9.2.0

# git clone --depth 1 --recursive https://github.com/gildas-lormeau/single-file-cli.git
Cloning into 'single-file-cli'...
remote: Enumerating objects: 48, done.
remote: Counting objects: 100% (48/48), done.
remote: Compressing objects: 100% (42/42), done.
remote: Total 48 (delta 9), reused 24 (delta 4), pack-reused 0
Receiving objects: 100% (48/48), 179.88 KiB | 1.49 MiB/s, done.
Resolving deltas: 100% (9/9), done.

# npm install
npm WARN deprecated rollup-plugin-terser@7.0.2: This package has been deprecated and is no longer maintained. Please use @rollup/plugin-terser

added 193 packages, and audited 194 packages in 7s

20 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

# chmod +x single-file

# chromium --version
find: ‘/home/at/.config/chromium/Crash Reports/pending/’: No such file or directory
Chromium 112.0.5615.121 built on Debian 12.0, running on Debian 12.0

# ./single-file --browser-executable-path chromium "https://www.google.com/"

# ls -lrt | tail -n 1
-rw-r----- 1 at at 128,821 2023-04-17 10:47 Google (2023-04-17 10_47_38 AM).html

# cat remove-images.js # copied from https://github.com/gildas-lormeau/SingleFile/wiki/How-to-execute-a-user-script-before-a-page-is-saved
// ==UserScript==
// @name         Remove images
// @namespace    https://github.com/gildas-lormeau/SingleFile
// @version      1.0
// @description  [SingleFile] Remove all the images
// @author       Gildas Lormeau
// @match        *://*/*
// @grant        none
// ==/UserScript==

(() => {

  const elements = new Map();
  const removedElementsSelector = "img";
  dispatchEvent(new CustomEvent("single-file-user-script-init"));

  addEventListener("single-file-on-before-capture-request", () => {
    document.querySelectorAll(removedElementsSelector).forEach(element => {
      const placeHolderElement = document.createElement(element.tagName);
      elements.set(placeHolderElement, element);
      element.parentElement.replaceChild(placeHolderElement, element);
    });
  });

  addEventListener("single-file-on-after-capture-request", () => {
    Array.from(elements).forEach(([placeHolderElement, element]) => {
      placeHolderElement.parentElement.replaceChild(element, placeHolderElement);
    });
    elements.clear();
  });

})();

# Works:

# ./single-file --browser-script remove-images.js --browser-executable-path chromium "https://www.google.com/"

# Fails (exits silently with status code 0):

# ./single-file --browser-executable-path chromium --browser-script remove-images.js "https://www.google.com/"
ivan commented 1 year ago

Debugging with

diff --git a/args.js b/args.js
index b1c4a68..0d56244 100644
--- a/args.js
+++ b/args.js
@@ -328,4 +328,5 @@ Object.keys(args).filter(optionName => optionName.includes("-"))
    .forEach(optionName => delete args[optionName]);
 delete args["$0"];
 delete args["_"];
+console.log(args);
 module.exports = args;

shows that when --browser-script comes after --browser-executable-path, url goes missing in the arguments, and the URL gets appended to browserScripts instead:

-url: 'https://www.google.com/',
-browserScripts: [ '/home/at/trash/remove-images.js' ],
+browserScripts: [ '/home/at/trash/remove-images.js', 'https://www.google.com/' ],
ivan commented 1 year ago

Based on https://github.com/yargs/yargs/blob/main/docs/tricks.md#arrays and https://github.com/yargs/yargs/blob/main/docs/api.md#array, it looks like it would be possible to remove all the .array(...) from args.js and still get arrays from yargs when an --option is given twice or more.

The only implementation annoyance there is that --browser-script one.js results in a string from yargs (not in an array), while --browser-script one.js --browser-script two.js results in an array with two strings.