chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.8k stars 1.24k forks source link

[Bug]: Unable to disconnect chroma.sqlite3 from memory #1560

Open gancancode opened 9 months ago

gancancode commented 9 months ago

What happened?

I was generating a chroma database (connection) using the 1st block of python code below which works fine. However, there are no options to unload or disconnect the persisted connection despite the 2nd block of python code below. When attempting to remove path where chroma.sqlite3 is persisted, the following error message was seen: "Failed to delete ./path/chroma.sqlite3. Reason: [WinError 32] The process cannot access the file because it is being used by another process:"

#1st block of code: Generating database using chroma
from langchain.vectorstores import Chroma
connection = Chroma.from_documents(documents=documents, 
                                embedding=embedding_function, 
                                persist_directory="./path")
#2nd block of code: Disconnecting chroma database from memory but does not work
connection._client.stop()
connection._client.reset()

How do I disconnect/unload chroma.sqlite3 from memory?

Versions

chroma = '0.4.14' python = '3.11.4' langchain = '0.0.350' windows = 10

Relevant log output

No response

tazarov commented 9 months ago

@gancancode, where does your code live? If it is in a process that keeps reference of the Chroma instance then the memory will not be freed until you restart the application.

.stop() will call stop on the component which in your case is SegmentAPI but that will not unload data .reset() will unload and delete segment indices (if reset is allowed, which by default it isn't)

The error you get is commonly encountered in Windows as the processes other than the Python process running your app usually also have the files open (e.g. antivirus)

gancancode commented 9 months ago

Hi, I'm running the python script locally. Usually for other database instances such as mySQL, stopping and closing the connection will unload the application from memory. (For e.g., Conn.close()). That doesn't seems to be the case for chromadb.

granawkins commented 6 months ago

I believe I'm encountering this now. My tests add chroma in a tempdir, and I can't seem to erase the tempdir in my fixture teardown.

Any resolution here? Can we manually close the sqlite connection? Is there a best practice for Windows?

Fowthy commented 4 months ago

Having the same issue at the moment. Any solutions?

sjoerd222888 commented 4 months ago

Same here. Would like to know how to be able to stop the connection.

codetheweb commented 3 months ago

I spent a while looking into this today as it's failing on Chroma's CI for Windows now. (And interestingly it wasn't only failing for the .sqlite file, it was also failing to delete hnsw files in the storage directory.)

It's possible this is a platform issue and not an issue specific to Python. For example, this Python package implements a pretty hacky workaround for file/directory deletes on Windows: https://github.com/rogerbinns/apsw/blob/f34a8dc6f2be3462740118af05e4c68f6905b86e/apsw/tests.py#L237

However, I was unable to find anyone else running into the same issue after some intensive searching & combing through CPython bugs—so at the moment, I personally think it's more likely to be a Chroma-specific bug. There's a similar issue described here, but I think the root cause is different.

I spent a while debugging, but unfortunately I haven't find the root cause yet.

The workaround linked above will likely patch this for now. Alternatively, if you're using tempfile.TemporaryDirectory() in a test fixture and using Python >= 3.10, you can use tempfile.TemporaryDirectory(ignore_cleanup_errors=True).