-
It would be good if there was a way to set cookies for requests to allow for crawling sites that require authentication.
Is there currently a way to do this, or is this feature planned?
-
The sitemap.xml.gz file can only contain 50000 items and be a maximum of 50k. For sites with a lot of content, the current xml is rejected by Google.
```
Sitemap size limits: All formats limit a s…
-
I'm having an issue where Wayback Machine links breaks crawling on completely unrelated pages
[This page](https://windowsitter.world/index.php?p=wanted) has links to two Wayback Machine links, [this …
-
Can we make an option for setting the max allowed request latency? Some of the sites I am crawling are responding very slow. Like 100000ms. I would like to be able to set an upper limit for this.
-
Adding an ad-blocker seems to make crawling much easier. On two sites I've tested, without an adblocker the number of requests is an order of magnitude higher than with it, and on one of the two sites…
-
This is more for research as we can serve different landing for mobile which is more tap friendly
![image](https://github.com/kodadot/nft-gallery/assets/5887929/9e79a2dd-000e-4e70-91f8-84dea9f893f…
-
Hi All!
I realize this should largely be about the actual 'crawling' of the sites - but given this was such a breeze with this tool I now find myself with the issue that the text that has been cra…
-
I suspect performance when checking an entire site would be better with the ability to run the link checker on a set of pages provided by the sitemap when available, vs the recursive crawling process.…
-
Hi,
On some sites, the crawler hangs up by throwing the following error:
```
Error: Couldn't unzip robots.txt response body
0|www | at decodeAndReturnResponse (/var/www/node_modules/…
-
I was wondering about the .. purpose(s) of rave.
I can see it's primary objective is to allow very easy development of modular (amd/cjs/es6) web apps where you don't have to worry about configuring a…