eklem / browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
https://eklem.github.io/browsercrawler/doc/
MIT License
2 stars 0 forks source link

store in indexdb/leveldb what is indexed #35

Closed eklem closed 5 years ago

eklem commented 6 years ago

Check out this example. Should store each ID of what is indexed. Store the ID after the add-step.

eklem commented 6 years ago

search-index-housekeeper will take care of this.

eklem commented 5 years ago

Rename search-index-housekeeper to browsercrawler-housekeeper. It should be set up to:

This way, there won't be any gaps in the housekeeping on what has been crawled, even if the process is interrupted by clicks to new pages. You will maybe get some overlapping crawling every now and then, but that's not a problem.