alephdata / memorious

Lightweight web scraping toolkit for documents and structured data.
https://docs.alephdata.org/developers/memorious
MIT License
311 stars 59 forks source link

Cancel crawlers if they are idle after a certain timeout period #107

Closed sunu closed 4 years ago

sunu commented 4 years ago

Also introduces emit_heartbeat function on context object. The aim is to expose a way for the crawlers to indicate that they are alive. It should be particularly useful if the crawler has a huge for loop that take hours to run like many of our bigger crawlers.