Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
2024-11-06T15:11:50.471Z [crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 1; desired_concurrency = 1; cpu = 0.581; mem = 0.0; event_loop = 0.227; client_info = 0.0
It shows values that are internally used by the desired_concurrency controller, but those value are hard to interpret by humans and thus not very useful to show in log. Make this log understandable.
On the other hand, the logged values should also be connected to values used by mentioned controller. If it gets readable, but detached from controller, then the log is again not very usable. So there is a risk that making this more readable would require changing the controller itself.
AutoscalePool periodically logs system load information in this function: AutoscaledPool._log_system_status
This looks for example like this:
It shows values that are internally used by the desired_concurrency controller, but those value are hard to interpret by humans and thus not very useful to show in log. Make this log understandable.
On the other hand, the logged values should also be connected to values used by mentioned controller. If it gets readable, but detached from controller, then the log is again not very usable. So there is a risk that making this more readable would require changing the controller itself.
See full discussion in: https://github.com/apify/crawlee-python/issues/662