gildas-lormeau / single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
GNU Affero General Public License v3.0
540 stars 58 forks source link

single-file-cli doesn't exit with error code when retriving page fails #17

Open sdht0 opened 1 year ago

sdht0 commented 1 year ago

Hi. On encountering a network error as shown below, single-file-cli throws the error but does not propagate the error code on exit, which makes it difficult to determine when retrieving a bunch of urls if any of them failed.

net::ERR_NAME_NOT_RESOLVED at <url elided> URL: <url elided>                                                                                                                                                                                      
Stack: Error: net::ERR_NAME_NOT_RESOLVED at <url elided>                                                                 
    at navigate (<path>/single-file-cli/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Frame.js:236:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)                        
    at async Frame.goto (<path>/single-file-cli/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Frame.js:206:21)
    at async CDPPage.goto (<path>/single-file-cli/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Page.js:439:16)
    at async pageGoto (<path>/single-file-cli/back-ends/puppeteer.js:194:3)    
    at async getPageData (<path>/single-file-cli/back-ends/puppeteer.js:132:3)
    at async exports.getPageData (<path>/single-file-cli/back-ends/puppeteer.js:56:10)
    at async capturePage (<path>/single-file-cli/single-file-cli-api.js:254:20)
    at async runNextTask (<path>/single-file-cli/single-file-cli-api.js:175:20)                                        
    at async Promise.all (index 0)

Can you ensure single-file-cli exits with appropriate code on an error? It would also be nice if there was a way to determine if the provided url returns a 404 (or any 400 or 500 code), but maybe that's harder to do.

sdht0 commented 1 year ago

I managed to get this working with the following diff for my use case. Feel free to pick this up and get it commit ready.

diff --git a/back-ends/puppeteer.js b/back-ends/puppeteer.js
index 4999b3d..4b25b90 100644
--- a/back-ends/puppeteer.js
+++ b/back-ends/puppeteer.js
@@ -187,10 +187,15 @@ async function pageGoto(page, options) {
        timeout: options.browserLoadMaxTime || 0,
        waitUntil: options.browserWaitUntil || NETWORK_IDLE_STATE
    };
+   var response;
    if (options.content) {
-       await page.goto(options.url, { waitUntil: "domcontentloaded" });
+       response = await page.goto(options.url, { waitUntil: "domcontentloaded" });
        await page.setContent(options.content, loadOptions);
    } else {
-       await page.goto(options.url, loadOptions);
+       response = await page.goto(options.url, loadOptions);
+   }
+   if (response.status() !== 200 ) {
+       process.exitCode = 2;
+       throw new Error(`Got response '${response.status()}'`);
    }
 }
diff --git a/single-file-cli-api.js b/single-file-cli-api.js
index d7c455c..77d8790 100644
--- a/single-file-cli-api.js
+++ b/single-file-cli-api.js
@@ -277,6 +277,9 @@ async function capturePage(options) {
        } else {
            console.error(error.message || error, message); // eslint-disable-line no-console
        }
+       if (!process.exitCode || process.exitCode === 0) {
+           process.exitCode = 1;
+       }
    }
 }
gildas-lormeau commented 1 year ago

Thank you, I think I will integrate your changes but behind a flag. Considering a 404 as an error can be controversial.

Yakabuff commented 1 year ago

@gildas-lormeau Instead of throwing an error and exiting out, could we just implement a flag --output-result that outputs http response code, url, http headers in stdout for users to parse?

Yakabuff commented 1 year ago

@gildas-lormeau I made a PR to address this: https://github.com/gildas-lormeau/single-file-cli/pull/31