apify / actor-templates

This project is the :house: home of Apify actor template projects to help users quickly get started.
https://apify.com/
24 stars 14 forks source link

Scrapy template: install_reactor can be replaced by configuration #274

Closed honzajavorek closed 4 months ago

honzajavorek commented 4 months ago

I just learned that

from scrapy.utils.reactor import install_reactor
install_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")

can be replaced by

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

(at least per scrapy-playwright's docs). Since the template already modifies the settings through apply_apify_settings(), change of the reactor could be done there, in a declarative way, instead of imperative import and a function call.

This is just a suggestion for improvement, I have no idea what are the design decisions behind the current design and it's very well possible that the two approaches are not comparable or have the same outcome.

fnesveda commented 4 months ago

Hi, thanks for the suggestion!

I vaguely remember that just setting the TWISTED_REACTOR setting didn't work correctly, because we have to configure AsyncioSelectorReactor and nest_asyncio in the correct order (install Reactor first, then apply nest_asyncio), otherwise Scrapy Actors don't run on Windows correctly.

Unfortunately, the whole Scrapy <-> asyncio integration is a mess, which is understandable given that Scrapy and Twisted started out way before asyncio was a thing, and then the asyncio integration was just shoehorned in. There are several hacks to make the Apify SDK work with Scrapy, with separate event loops for request queues and so on, it's pretty ugly 😬

I'll close this issue, as the code has to be the way it is, but I'll add a comment to the template explaining the situation better.

Thanks anyway, we're happy for these kinds of suggestions! 🙂

fnesveda commented 4 months ago

CC @vdusek just FYI