There're two commits
First of all, we have to pass some checks, and bring the sessions to the subsequent requests. Therefore, I suggest that we can take cookie_jar to manipulate sessions.
Secondly, there's a bug, and happen to block while dequing.
Since, the condition at "spider.is_running()" has a chance that queue is empty but len(parser.parsing_urls) > 0, the queue.get() coroutine would block here for coming up urls. Indeed, there's another possibility that new urls would enqueue somehow; nevertheless, if there's no other new url, the event_loop won't stop unless you interrupt it.
I try to impose the timeout mechanism for it. By default, we will wait for 5 secs until new urls arrive; otherwise, it would cause timeout.
There're two commits First of all, we have to pass some checks, and bring the sessions to the subsequent requests. Therefore, I suggest that we can take cookie_jar to manipulate sessions.
Secondly, there's a bug, and happen to block while dequing. Since, the condition at "spider.is_running()" has a chance that queue is empty but len(parser.parsing_urls) > 0, the queue.get() coroutine would block here for coming up urls. Indeed, there's another possibility that new urls would enqueue somehow; nevertheless, if there's no other new url, the event_loop won't stop unless you interrupt it.
I try to impose the timeout mechanism for it. By default, we will wait for 5 secs until new urls arrive; otherwise, it would cause timeout.