DocNow / diffengine

track changes to the news, where news is anything with an RSS feed
MIT License
177 stars 30 forks source link

diffengine sometimes hangs without explanation #25

Closed ryanfb closed 7 years ago

ryanfb commented 7 years ago

Noticed on 2017-01-27 for cnn_diff:

2017-01-25 10:16:43,128 - root - INFO - shutting down: new=13 checked=495 skipped=1691 elapsed=0:16:41.543544
2017-01-25 10:30:02,301 - root - INFO - starting up with home=/Users/ryan/source/diffengine/cnn_diff
2017-01-25 10:30:02,317 - root - INFO - fetching feed: http://rss.cnn.com/rss/cnn_topstories.rss
2017-01-25 10:30:03,048 - root - INFO - found new entry: http://rss.cnn.com/~r/rss/cnn_topstories/~3/KiO2MctO3eI/index.html
2017-01-25 10:30:03,240 - root - INFO - found new entry: http://rss.cnn.com/~r/rss/cnn_topstories/~3/iyOB4KTWINU/index.html
2017-01-25 10:30:03,413 - root - INFO - found new entry: http://rss.cnn.com/~r/rss/cnn_topstories/~3/q2hFlt0ZpK0/index.html

and bbc_diff:

2017-01-26 05:52:10,719 - root - WARNING - Got 404 when fetching http://www.bbc.co.uk/news/world-us-canada-38702983
2017-01-26 05:53:29,932 - root - INFO - fetching feed: http://feeds.bbci.co.uk/news/rss.xml?edition=int
2017-01-26 05:53:38,509 - root - INFO - fetching feed: http://feeds.bbci.co.uk/news/system/latest_published_content/rss.xml
2017-01-26 05:53:38,673 - root - INFO - shutting down: new=6 checked=226 skipped=1851 elapsed=0:08:36.948907
2017-01-26 06:00:01,536 - root - INFO - starting up with home=/Users/ryan/source/diffengine/bbc_diff
2017-01-26 06:00:01,545 - root - INFO - fetching feed: http://feeds.bbci.co.uk/news/rss.xml
2017-01-26 06:00:01,674 - root - INFO - found new entry: http://www.bbc.co.uk/news/science-environment-38755229
2017-01-26 06:00:01,717 - root - INFO - found new entry: http://www.bbc.co.uk/news/business-38748296
2017-01-26 06:00:01,765 - root - INFO - found new entry: http://www.bbc.co.uk/news/magazine-38722929
2017-01-26 06:01:41,721 - root - WARNING - Got 404 when fetching http://www.bbc.co.uk/news/world-us-canada-38702983

Both had long-running processes but didn't appear to be logging or doing anything new.

I tried using dtruss to trace syscalls in the running processes before killing them, but no syscalls were being made (using dtruss on a successfully-running diffengine instance produces a lot of output).

edsu commented 7 years ago

Weird, maybe it would be good to make sure http requests have a timeout?