Open tlinhart opened 2 months ago
Hello, and thanks for the reproduction! It seems that the problem is here:
It looks like service_container.get_storage_client
does not consider the adjusted configuration
.
Also, we have a test for this - https://github.com/apify/crawlee-python/blob/master/tests/unit/basic_crawler/test_basic_crawler.py#L630-L639 - which probably fails because we're looking inside a different storage directory than the global one.
Consider this sample program:
The
configuration
argument given toParselCrawler
is not respected, during the run it creates the./storage
directory and persist all the (meta)data. I have to work around it by overriding the global configuration likes this: