apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev
Apache License 2.0
15.63k stars 669 forks source link

Create specialized and contextual errors #646

Open pocesar opened 4 years ago

pocesar commented 4 years ago

Following the post on https://github.com/apifytech/apify-js/pull/642#issuecomment-608586477 I'm created a separated issue for Error improvements.

By creating rich errors, we can help with debugging code with some actual context of when the issue happened. Specialized errors instead of throw new Error(string) could make some things come true:

example implementation: https://github.com/pocesar/apify-facebook-crawler/blob/master/src/error.ts#L46

instead of bikeshedding about errors names, let's focus on what should the error contain. my initial list for everything-Apify:

The errors would extend a ApifyBaseError so when an error happens, you can do a instanceof to see if it's an internal error or an external (from dependencies, code / syntax errors, etc)

mnmkng commented 4 years ago

Great kick-off. I pretty much agree with everything. Here are a few notes of my own.

The number one thing users hate about our error management is the "internal" logging of SDK (i.e., not manageable by the user). Such as request timed out and will be retried. They would like to be able to turn this logging off, replace it with a warning, or at least remove the stack from it. There's even an attempted PR, but we were not completely happy with it.

We need to design a way to make this customizable via logger. But that needs the ability to select which errors to log. Perhaps via instanceof checks in the log.exception function and then providing the logger instance with a list of exceptions you are / are not interested in logging.