aosabook / 500lines

500 Lines or Less
Other
29.23k stars 5.87k forks source link

What happens when a reponse takes very long #141

Open kootenpv opened 9 years ago

kootenpv commented 9 years ago

I tried to add:

            response = yield from asyncio.wait_for(
                self.session.get(url, allow_redirects=False), 20)

instead of

            response = yield from self.session.get(url, allow_redirects=False)

In order to prevent hanging from a server by introducing a max_timeout, but this seems to open up a lot of CancelledErrors (and a lot of Task was destroyed but it is pending). Any idea?

asvetlov commented 9 years ago

You should wrap wait_for call into try/except block and gracefully close the task. https://github.com/aosabook/500lines/blob/master/crawler/code/crawling.py#L233 is a good place for catching at first glance.

kootenpv commented 9 years ago

I am a bit confused. It seems as if you want to put a timeout there, while I'm talking about the crawling "get" (targeting the server).

It would seem the place where you suggest is that whenever somehow getting an item from the queue taking too long, then it would gracefully end.

Whereas I'm in the fetch method (https://github.com/aosabook/500lines/blob/master/crawler/code/crawling.py#L175) ( trying to put the wait_for). Is that still correct?

kootenpv commented 9 years ago

I put it at both places, and that seems to solve some issues. But now whenever the queue is empty it will try to stop the worker, and it will throw a ERROR:asyncio:Task was destroyed but it is pending!

I catch it, but still I get 2 messages PER worker at the end of the script (not so nice for wanting to save the logging):

ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending coro=<get() done, defined at    /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/queues.py:160> wait_for=<Future pending cb=[Task._wakeup()]> cb=[_release_waiter(<Future cancelled>)() at /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/tasks.py:333]>