chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
13.48k stars 1.14k forks source link

[Bug]: sqlite3.OperationalError: database or disk is full (Not really) #2247

Open barrybriggs opened 1 month ago

barrybriggs commented 1 month ago

What happened?

Have approx 450,000 sentences I am putting into ChromaDB; the database is on my F: drive which has about 7TB free, so plenty. I put 20,000 sentences at a time into the db and made it about halfway through when I received the SQLite error above. I have tons of memory and the database drive is practically empty...then I (almost by accident) noticed that the C: drive was down to about 34k of space available!

are there temp files I should be aware of and if so how do I point them at the f: drive?

Versions

Anaconda Python 3.10.10, Chroma latest (yesterday), Windows 11, Intel i9, 64GB RAM

Relevant log output

No response

tazarov commented 1 month ago

@barrybriggs, we've had a similar recent problem. Can you check how much storage you have under /tmp? SQLite uses it to store large queries in temp.

barrybriggs commented 1 month ago

Unfortunately I just cleared out my \temp (Windows) folder (situation was a bit critical as space on C: drive went to zero). Does SQLite use the TMP or TEMP environment variable? -- if so, I could point it somewhere else and see if that works better. (Maybe I'll just try it anyway :-)).

EDIT: looks like SQLite uses an internal pragma to set the tmp directory -- maybe you have to force it to use the environment variable value.

tazarov commented 1 month ago

We're tracking this issue here too https://github.com/chroma-core/chroma/issues/1693#issuecomment-2102618408.

Soon we'll either put in a guidance or directly fix it.

Here's the sqlite3 docs on the temp usage and pragmas - https://www.sqlite.org/tempfiles.html