Open adriano515 opened 1 year ago
@adriano515, this was fixed sometime ago with https://github.com/chroma-core/chroma/pull/1080 and we did test against Win10. But if what you're saying is true, then there should be a uuid name dir that reflects the collection's segment in ./chroma dir.
Can you share some code to reproduce this?
Left a video here: https://discord.com/channels/1073293645303795742/1153432513641988137 @tazarov
Hi, I have exactly the same issue on MacOS with a ChromaDB v0.5.0 and a local persist directory. This is very strange, as this exact same issue seems to have been mentioned in #1009 and solved by #1080.
I am using from langchain_community.vectorstores import Chroma
, however the issue seems to be linked to Chroma, not LangChain.
Steps to reproduce:
.from_documents()
method. We then have a db created and 2 folders named with ids:.delete_collection()
to delete the first collection. We can confirm it is deleted by looking at the database (DB Browser), there is only one "VECTOR" scope id:How I fix in a similar way as in #1080 (which unfortunately doesn't work for me):
I add this code:
import shutil
import sqlite3
def get_ids(path):
database = sqlite3.connect(path)
cursor = database.cursor()
cursor.execute("SELECT id FROM segments WHERE scope = 'VECTOR'")
ids = cursor.fetchall()
return [id[0] for id in ids]
def delete_unexisting_files(path, ids):
elements = os.listdir(path)
elements.remove(".DS_Store")
elements.remove("chroma.sqlite3")
for el in elements:
if el not in ids:
shutil.rmtree(os.path.join(path, el))
ids = get_ids(os.path.join(chroma_db_dir, "chroma.sqlite3"))
delete_unexisting_files(chroma_db_dir, ids)
Of course, it would be nice to directly add this in the library, by simply "shutil.rmtree" the directories associated to .delete_collection(), without seeking manually for the ids in the db...
Should I re-open an issue, @tazarov ?
The #1080 fix to act on .delete() in /segment/impl/vector/local_persistent_hnsw.py
, but .delete_collection() is defined in /api/segment.py
and in /db/mixins/sysdb.py
Hi, I have the same issue in windows environment. All you need to do is create a collection, add some documents and try to delete it. You will see collection going away but not the directories. In our use-case, we need to run embeddings on a daily basis and as you can imagine this would result in proliferation of directories leading to slowness in retrievals. So, please fix this at the earliest.
@alexgravx, let me revisit this. Can you share some details about your OS version, python version, CPU (M or Intel)?
EDIT: Do you have an antivirus or similar that may scan open files, thus preventing Chroma from removing the dir?
Hi @tazarov,
Thanks for you reply ! Here are the details:
OS version: macOS Sonoma 14.4 and 14.5 Python version: 3.9.6 CPU: M2 (ARM architecture), on a Mac Book Air model.
I don’t have any antivirus. The only protections on my mac are the ones from Apple. Moreover, I didn’t get any alert at the time so I think it wasn’t linked to another app/process.
What happened?
When using client.delete_collection("collection_name") the db is deleted from SQLite3 but the directory and the files are not deleted (not even the contents of them since all weight more than 0kb)
Versions
Chroma v0.4.10, Python 3.10, Windows 11
Relevant log output
No response