Open petacube opened 3 months ago
rolling the code back to 0.5.0 release of chromadb resolves the issue. please explain what is going on with crash
Do you have a stack trace or any output?
it crashes silently. the whole python process dies. there is not even exception thrown. i can try testing with on linux tmr to see if i can replicate the crash and run systrace to see if core dump can be captured.
similar/same issue reported in discord - https://discord.com/channels/1073293645303795742/1261229903383236720
Windows Fatal Exception: Access Violation
@petacube, unable to reproduce on GH windows-latest
Here's the test code - https://github.com/amikos-tech/chrm-2513-exp/blob/main/test_import.py
With the following WF - https://github.com/amikos-tech/chrm-2513-exp/actions/runs/9919966674/workflow
Conda env with Python 3.9 and Chroma 0.5.4
I tried adding things in bulk and separately. I also intentionally have high dimensional vectors (4096).
Let me know if you encounter the error in a similar setting.
Hmm, I wonder if this is due to a chroma-hnswlib version mismatch. Can you run pip show chroma-hnswlib
? It should be 0.7.5 for chroma 0.5.4
my version of chroma-hnswlib is 0.7.3 should not the dependency like this be handled at chromadb level ?
It is set here, i am not sure how you updated but maybe something went wrong. Can you upgrade the dep and try again
i did pip install --upgrade chromadb==0.5.4, so probably that does not upgrade dependencies possibly?
I had the same issue: Silent crash after updating to chromadb 0.5.4 on Windows EVEN WITH chroma-hnswlib vers. 0.7.5
I moved back to chromadb 0.5.0 and chroma-hnswlib 0.7.3 and everything is working like before.
@kaixxx, can you confirm whether you were using anaconda
? A user in Discord reported that the problems were resolved when he switched from anaconda
to pip
.
On a related note: If your environment rebuilds the chroma-hnsw
lib that can be the culprit. Can you let me know what Python version and CPU Arch you have? We have prebuilt wheels for amd64 only on Windows (py39-py312).
Thanks for looking into this. Here is some additional info:
@kaixxx, in your venv can you run the following code with python:
import hnswlib
import numpy as np
index = hnswlib.Index(space="l2", dim=1024)
index.init_index(max_elements=1000, ef_construction=100, M=16)
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))
Let me know if this crashes
Yes, it seems to crash.
I've created a new environment, installed chromadb (0.5.4 with chroma-hnswlib 0.7.5).
Then I've added the line print('finished')
to the end of your script. This line is never reached. The script exits silently without any error message.
In my other environment with chromadb 0.5.0, the script runs fine and prints 'finished'.
Another test: I've now downgraded to chroma-hnswlib 0.7.3 but kept chromadb 0.5.4 and your script runs fine!
@kaixxx thanks for confirming. Can you add debug prints like this to identify whether it fails in the init of the index or when adding vectors:
import hnswlib
import numpy as np
index = hnswlib.Index(space="l2", dim=1024)
print("New index - ok")
index.init_index(max_elements=1000, ef_construction=100, M=16)
print("Init index - ok")
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))
print("All good")
Yes, output:
New index - ok
Init index - ok
(no "All good")
@kaixxx, fantastic. Thank you for following up. 0.7.5 adds this change to add_items functionality - https://github.com/chroma-core/hnswlib/commit/408c5d1fa1dbc2acd8d1b4108191a8f803862210?diff=split&w=0#diff-ab27cbb27975c68cb0c6da824871058623f7f76a761c3c8365ef2e1395cf7cd9R1706-R1708
Can I ask you to rebuild the HNSW lib locally (if you have the necessary deps):
pip install --no-binary :all: chroma-hnswlib==0.7.5
Hey @tazarov, I've tried to build it but it results in an error from the linker that a certain file could not be opened. It may be that my build environment is not set up properly, but I don't have the time to dig into that. Is there anything else I can do?
when the document's length big enough and insert the 100th , then the bug will occur, Whether you insert data one by one or all at once
Reproduced for python 3.12 and 3.10 on our windows machine (though this does not show up in CI, we should figure out why - perhaps the number of embeddings we insert in CI is not large enough to trigger this).
@HammadB and I are looking into it.
I have confirmed that running with --no-binary (building from source) fixes this as a workaround. This points to an issue in the wheel build. Investigating further.
It seems the windows wheels were building with AVX/SSE enabled if the runners they were compiled on had it, I guess previously for 0.7.3 the runner just happened to not have AVX/SSE but now it does. I have pushed an alpha release 0.7.6.alpha1.
@dddxst and @kaixxx and @petacube can you pip install chroma-hnswlib==0.7.6a1
and let me know if that fixes your issue? If so, I can issue a main release. Thanks.
Thanks! I've tested chroma-hnswlib 0.7.6a1 with the above script and it still crashes, unfortunately. Exactly the same behavior as described in https://github.com/chroma-core/chroma/issues/2513#issuecomment-2231002576
Have reproduced the 0.7.6a1 failure on our windows machine. The next step is to put a debugger on the cpp code itself. This will be a bit hairy but will coordinate with @HammadB to ship a fix.
I had the same problem with 0.5.5 and downgrading to 0.5.3/0.7.3 has solved it for now!
@EricBLivingston what version of python are you on?
@EricBLivingston what version of python are you on?
Version 3.11.9
It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149
It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149
@tazarov can you please post a summary of this long conversation here for easy reference? There is a lot going on and it's unclear to me what the issue is. Which python version is the user on?
@atroyn, the use has the following config:
Window 10 AMD Ryzen 5 3600xt Python 3.12 Running in local jupyter notebook
Versions where they manage to reproduce the bug: 0.5.3 and 0.5.4
They had build chain (msvc) and tried to build from source, but encountered a build error (something related to ninja). Attaching the build failure here. chroma-hnswlib==0.7.3-build-failure.txt
We should advise users on Windows to downgrade to python 3.10 for their Chroma environments.
I just had a similar error. Using chroma in Jupyter Notebook, the kernel shuts down and restarts after trying to insert the 100th element.
Versions: chromadb: 0.5.5 chroma-hnswlib: 0.7.6
I tested this with this code based on the above, it works in command line python without any issues, but it crashes again when I try to run it in the notebook.
import hnswlib
import numpy as np
print("Starting")
index = hnswlib.Index(space="l2", dim=1024)
print("After index declaration")
index.init_index(max_elements=1000, ef_construction=100, M=16)
print("After Init Index")
vectors = np.random.randn(1000, 1024).astype(np.float32)
print("After Vector creation")
index.add_items(vectors,ids=np.arange(1000))
print("Done")
The kernel log only says this with Debug mode: (the first json block is the last thing I printed before the crash)
{'buffers': [],
'content': b'{"name": "stdout", "text": "Number of existing docs in DB: 99\\nN'
b'umber of new chunks: 1\\n"}',
'header': {'date': datetime.datetime(2024, 8, 12, 13, 33, 23, 677442, tzinfo=tzutc()),
'msg_id': '91678410-7c9cf6a1b73f88beb4462c14_10376_136',
'msg_type': 'stream',
'session': '91678410-7c9cf6a1b73f88beb4462c14',
'username': 'username',
'version': '5.3'},
'metadata': {},
'msg_id': '91678410-7c9cf6a1b73f88beb4462c14_10376_136',
'msg_type': 'stream',
'parent_header': {'date': datetime.datetime(2024, 8, 12, 13, 33, 23, 439000, tzinfo=tzutc()),
'msg_id': 'ea6cbfb9-0227-4f8e-9717-e8bc860aef91',
'msg_type': 'execute_request',
'session': '8ba47159-9da6-45dc-940c-df463dd8f2c0',
'username': '',
'version': '5.2'}}
[I 2024-08-12 14:33:27.083 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
[W 2024-08-12 14:33:27.083 ServerApp] kernel 2975483f-61dd-4ee6-bdf0-a04c8a086712 restarted
When executing the lines one by one, this is what crashes (as expected): index.add_items(vectors,ids=np.arange(1000))
@Latetide are you also on Windows? Could you please post the results of running msinfo32
if so?
Yes, Windows 10. Python version 3.12.4, jupyter version 7.2.1
Here is the msinfo file: msinfo.txt
Could you downgrade to python 3.10 in your Chroma environment, reinstall, and try again?
FYI stumbled across this while working through a lanchain demo. :) Same experience as others above - VSCode, Jupyter NB crashes when I try to load text chunks - it loaded 10 fine, when I bumped it to 50 it crashes. The test script above (import hnswlib, numpy, etc) also ended in a crash.
Windows 11 Chroma 0.5.5 chroma-hnwslib: 0.7.6 Python 3.10.11 ipykernel 6.29.5 pytorch 2.4.0 cuda 12.4
Hope this helps! I'll roll back to a previous working version noted above.
I am getting this error too on Windows 11 while using langchain chroma. These are the details of my system:
Windows 11 Ryzen 5 2400G Python 3.12 chromadb==0.5.5
Downgrading to python 3.10 seems to work/fix the issue
python --version Python 3.10.14 pip freeze | grep -i chroma chroma-hnswlib==0.7.6 chromadb==0.5.5 langchain-chroma==0.1.2
chroma-hnswlib
Thanks a lot. fixed the issue.
Bug Still exists with Win10 chromadb 0.5.12 chroma-hnswlib-0.7.6
I had the same problem with 0.5.13 and downgrading to 0.5.3 has solved it for now!
As I've opened this issue #2992 , tried to trace the chroma-hnswlib
to make a sense of the root cause, but seems the build doesn't contain the debug info
, so I couldn't step into its code, am I right? if so, could you release a build with proper debug info?
(edit): debugging environment and tools
What happened?
running collection.add function crashes after 100 documents are inserted
Versions
chromadb 0.5.4, python 3.9;
Relevant log output
No response