chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
13.36k stars 1.14k forks source link

[Bug]: Can't set max batch size #2403

Open superchargez opened 6 days ago

superchargez commented 6 days ago

What happened?

I tried adding large number of documents but hit the limit of 5461, so I tried to change the limit but nothing happened:

In [16]: client.max_batch_size = 44445

In [17]: client.get_max_batch_size() Out[17]: 5461

Versions

Window 11 Python 3.12.4 Chroma version: '0.5.3'

Relevant log output

No output: just did not update the batch size limit.
tazarov commented 6 days ago

@superchargez, the max batch size in Chroma is a function of underlying SQLite. Most OS comes with a built-in release of sqlite3 (most of the time Chroma relies on those). The pre-build SQLite distros have been compiled with certain limits on which the max match size is based, and unfortunately, they cannot be changed.

Exposing the max batch size is intended to make users aware of the SQLite limits Chroma enforces. The general approach in cases like yours is to split up your large batch. We have provided an example of how to do that here - https://github.com/chroma-core/chroma/blob/main/chromadb/utils/batch_utils.py