Gather statistics on HTTP rate limit errors

apify / apify-sdk-python

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.

https://docs.apify.com/sdk/python

Apache License 2.0

120 stars 11 forks source link

Gather statistics on HTTP rate limit errors #318

Open janbuchar opened 7 hours ago

janbuchar commented 7 hours ago

This is necessary for apify/crawlee-python#60 to bring benefit to running crawlee on Apify

Rate limit errors should have status code 429 (please recheck)

vdusek commented 7 hours ago

@Mantisus

B4nan commented 7 hours ago

This is handled on client level, not SDK. Client is making requests, and saving stats on the rate limits (yes, its about the 429 status codes). Crawlee then reads this from the client.

https://github.com/apify/apify-client-js/blob/master/src/statistics.ts#L18 https://github.com/apify/crawlee/blob/master/packages/core/src/autoscaling/snapshotter.ts#L383

The only thing that SDK is doing here is switching the storage client to the apify client on the platform, which I know you have a bit differently in the python versions.

janbuchar commented 6 hours ago

In Python SDK, we wrap the client instead of using it directly as a StorageManager. Also, the Python API client does not seem to collect those statistics (please correct me if I'm wrong).

B4nan commented 6 hours ago

Also, the Python API client does not seem to collect those statistics (please correct me if I'm wrong).

Maybe, but that doesn't mean it shouldn't be implemented there, right? SDK is not making requests, the client is.