andreantonacci / everynoise_scraper

Two webscrapers to collect data from everynoise.com
4 stars 2 forks source link

monitoring #4

Open hannesdatta opened 4 years ago

hannesdatta commented 4 years ago

I plan to run these scrapers on another computer. is there a way in which scrapy can monitor its performance? Essentially I think of a monitoring script which reports about the current health of the operations. that script I would like to run my main computer, so that I can always see the current status of the collection. As a first suggestion, you can look at wha scrapy has to offer about monitoring. Maybe they have already a solution to this.

andreantonacci commented 4 years ago

I have enabled logging that is the easiest way to monitor performance (the same way we did with electionstats). Basically all the logs from every crawling process are stored so you can always access them. I would suggest you use SnakeTail for this purpose http://snakenest.com/snaketail/ - you can keep it open and see the log in real-time + filter from INFO level and above (I would ignore all the debug logs). Important logs such as errors or criticals can also be formatted/highlighted differently, so you can see them clearly. By the way, in case of an error in the uploading of files to S3, these get moved to an "error" directory, so you can easily recover them (no need to scroll through the logs to find them out).

hannesdatta commented 4 years ago

cool, these are good insights!!!!!!