chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.44k stars 1.3k forks source link

Test for persist overwrite #694

Closed atroyn closed 1 year ago

atroyn commented 1 year ago

Multiple Chroma clients can stomp each other's persistence.

albertovilla commented 1 year ago

I don't know how do you plan to solve this but a partial solution would involve:

  1. Read current content from the existing parquet into a dataframe (e.g. df_parquet)
  2. Get content from the database as a dataframe (e.g. ' df_db' )
  3. Merge df_parquet and df_db

This is a partial solution as there are some tricky cases to consider:

I guess one option would be to try to implement this into some kind of thread-safe singleton class for the persistance i.e. forcing to use a single instance of an object to perform the persists will ensure consistency.

HammadB commented 1 year ago

This was an issue in the duckdb regime, we have tests for this sort of thing now!