Gather URLs to crawl - Githubissues

eklem / browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

https://eklem.github.io/browsercrawler/doc/

MIT License

2 stars 0 forks source link

Gather URLs to crawl #10

Open eklem opened 6 years ago

eklem commented 6 years ago

Create a queue of URLs to crawl from each page you visit. Should obey robots.txt

eklem commented 6 years ago

Should maybe be another library or a function you can turn on/off, or a function that can be run separately. We'll see. The robots.txt-part will be a check just when each link is to be fetched.

eklem commented 3 years ago

Use robots-parser? No dependencies, so should be easy enough to make run in the browser.