algolia / docsearch-scraper

DocSearch - Scraper
https://docsearch.algolia.com/
Other
305 stars 106 forks source link

Partial index promotion due to scrappy spider signals not being handled #540

Open ProTip opened 3 years ago

ProTip commented 3 years ago

I have personally experienced Ctrl-C resulting in an incomplete index.

The scrappy documentation for spider_closed signal, https://docs.scrapy.org/en/latest/topics/signals.html#scrapy.signals.spider_closed , mentions that the reason for the closing should be finished under normal circumstances. However the reason could be shutdown in the case of a Ctrl-C signal stopping the spider.

It doesn't appear the that doc spider is patching into any of the spider signals. In the case that a spider does not finish naturally the scraper is unaware and will proceed to promote the incomplete temp index.