Python Scrapy Actor uses Request Queue

Description

Update the Python Scrapy Actor template to use our Request Queue (and not Scrapy default storage).
This was quite a challenge, mainly because of the usage of the Twisted library (alternative to Asyncio) in the Scrapy project.
PR mainly provides our custom Scheduler and a customized version of RetryMiddleware. Both of these components interact with Request Queue.
All interactions with the Request Queue have to be executed outside of the main event loop. This is solved by creating a custom nested event loop and all the RQ interactions are executed in it.

Additional explanation of certain parts of code

Exceptions in `run_until_complete`

If we use run_until_complete in a custom-nested event loop like this:

try:
    event_loop.run_until_complete(foo_coroutine())
except BaseException:
    traceback.print_exc()

We must wrap it into a try block and call traceback.print_exc() in the except part if we want the exception to be propagated. Otherwise, the request processing will be terminated without any notice in the log or something and Scrapy will just continue with the next one. Which could be pretty confusing...

"Robots" requests

"Robots" requests (*/robots.txt) are bypassed directly from the Engine (through middlewares) to the Spider. They don't go through a Scheduler. It would be pretty hard to try to force them to go through the Request Queue. So in our Retry Middleware, we identify these requests and do not make any interaction with the Request Queue in such cases.

Ticket

Closes #183

Blocked by

Merge https://github.com/apify/apify-sdk-python/pull/113 and release a new version of SDK first.

apify / actor-templates