apify / apify-sdk-python

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
https://docs.apify.com/sdk/python
Apache License 2.0
117 stars 11 forks source link

Move most of the Scrapy template's logic to Apify SDK #176

Open honzajavorek opened 9 months ago

honzajavorek commented 9 months ago

I think some logic from __main__.py could be moved to the SDK. I think new_configure_logging could be a decorator which is just imported from Apify SDK. I think configure_logger and logger names could be imported.

Similarly, main.py contains _get_scrapy_settings, but it could also be just imported, if turned into this:

def apply_apify_settings(settings: Settings, proxy_config: dict | None = None):
    ...
    return settings

Then it wouldn't need to call get_project_settings() and would leave space for custom modifications before or after applying Apify-specific settings.

The main benefit of doing this is that the template contains less boilerplate and it's easier to control and maintain updates. If new Scrapy version is published and the logic of the monkey patching or anything else needs to be changed, I could just upgrade Apify SDK with updates. As of now the only way is to watch updates to the template manually and update my code by copy-pasting.

Scrapy is a library and I can upgrade it carefully or pin it to a certain version, but Apify is a SaaS platform. If something changes in Apify and it won't be compatible with the old template code, it can just break my actors out of nowhere. Only then I will be prompted to go and see if the template looks differently than last time. Not ideal.

Follow up to https://github.com/apify/apify-sdk-python/issues/132, vaguely related to https://github.com/apify/actor-templates/issues/264

honzajavorek commented 9 months ago

I did some changes in my implementation so that it's more tidied up:

Feel free to grab inspiration from what I did, or even chunks of code (MIT licensed, just mention my name). I guess I've solved this for myself now. If there are updates to the Scrapy template, I hope I'll be able to somehow keep up with it and backport changes to my highly customized project.

vdusek commented 9 months ago

Hi Honza, thank you for opening this. Moving as much code as we can from the template to the SDK is definitely a good way to go. Unfortunately, adding new features to our Scrapy-Apify integration is not a priority for this quarter, so I cannot promise I'll have time to take a look at this in the near future.

vdusek commented 9 months ago

The function apply_apify_settings was moved to SDK in https://github.com/apify/apify-sdk-python/pull/178. Let's solve the rest of it later.