algolia / algoliasearch-client-python

⚡️ A fully-featured and blazing-fast Python API client to interact with Algolia.
https://www.algolia.com/doc/api-client/getting-started/install/python/?language=python
MIT License
196 stars 67 forks source link

Increasing memory usage when using replace_all_objects #557

Open AugPro opened 1 year ago

AugPro commented 1 year ago

Hello, I have a memory issue when using replace_all_objects. When using this function with a significant amount of documents (5 Million), I use an iterator to minimize memory consumption. I expect the memory usage to stay flat during the operation, however it keeps increasing. (cf image below) image

Upon investigation, it looks like the cause of this memory usage increase comes from the function SearchIndex._chunk, and more specifically the list raw_responses, which stores responses for every request sent. https://github.com/algolia/algoliasearch-client-python/blob/3bb9108d9dff627f12c921ad23dab02984f70a44/algoliasearch/search_index.py#L505-L528

This is a problem because the response of /1/indexes/{indexName}/batch contains the list of objectIDs

{
  "taskID": 792,
  "objectIDs": ["6891", "6892"]
}

With 5M documents, each with an objectID of ~15 characters, this accounts for 300MB.

>>> sys.getsizeof("123456789012345") * 5_000_000 / (1024**2)
305.17578125

Is there a request_option for the API not to return objectIDs, or for the code not to store them in raw_responses ?

Thank you 🙏