Closed vdusek closed 11 months ago
I did some initial experiments with the BeautifulSoup Template Actor, comparing the use of the synchronous requests library to the asynchronous httpx library. The Actor code and the input remained the same for both scenarios.
When using requests
, the execution time was 43 seconds (Actor run - https://console.apify.com/actors/runs/CtmF9T6KRGC5FnJud#log).
When using httpx
, the execution time improved to 26 seconds (Actor run - https://console.apify.com/actors/runs/VDcmt5tVlvmd9ree1#log).
Considering that our SDK is already asynchronous-only, there should be no problem in transitioning from the synchronous requests
to the asynchronous httpx
. So this will be a good first step.
After a discussion with @B4nan we decided to keep the templates as simple as possible and not to add the "parallelism using asyncio.Queue
". So the only optimization as part of this issue is the usage of HTTPX
instead of Requests
and further performance improvement will be achieved by implementing the AutoscaledPool to the Python SDK.
In JS SDK we have AutoscaledPool for executing tasks in parallel. We don't have similar functionality in the Python SDK yet, however, it's planned in the upcoming months.
For now, users could just write some simple utility themselves, using
asyncio.Queue
to get what they need (https://docs.python.org/3/library/asyncio-queue.html#examples).Writing
AutoscaledPool
might take a while. We could update our templates with some super simple parallelism usingasyncio.Queue
, it might take like 10 extra lines.The issue is based on the Discord question: