Closed vdusek closed 1 year ago
just one suggestion and few questions (haven't tried it locally)
Just in case, you would like to try it locally, it should be pretty easy (execution with default Actor input):
cd templates/python-scrapy/
virtualenv --python $(which python3.11) .venv
source .venv/bin/activate
pip install -r requirements.txt
apify run --purge
Description
Additional explanation of certain parts of code
Exceptions in
run_until_complete
If we use
run_until_complete
in a custom-nested event loop like this:We must wrap it into a try block and call
traceback.print_exc()
in the except part if we want the exception to be propagated. Otherwise, the request processing will be terminated without any notice in the log or something and Scrapy will just continue with the next one. Which could be pretty confusing..."Robots" requests
"Robots" requests (
*/robots.txt
) are bypassed directly from the Engine (through middlewares) to the Spider. They don't go through a Scheduler. It would be pretty hard to try to force them to go through the Request Queue. So in our Retry Middleware, we identify these requests and do not make any interaction with the Request Queue in such cases.Ticket
Blocked by