chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.19k stars 1.28k forks source link

[Bug]: chromadb 0.5.4 crashes on windows #2513

Open petacube opened 3 months ago

petacube commented 3 months ago

What happened?

running collection.add function crashes after 100 documents are inserted

Versions

chromadb 0.5.4, python 3.9;

Relevant log output

No response

petacube commented 3 months ago

rolling the code back to 0.5.0 release of chromadb resolves the issue. please explain what is going on with crash

HammadB commented 3 months ago

Do you have a stack trace or any output?

petacube commented 3 months ago

it crashes silently. the whole python process dies. there is not even exception thrown. i can try testing with on linux tmr to see if i can replicate the crash and run systrace to see if core dump can be captured.

tazarov commented 3 months ago

similar/same issue reported in discord - https://discord.com/channels/1073293645303795742/1261229903383236720

Windows Fatal Exception: Access Violation

image

tazarov commented 3 months ago

@petacube, unable to reproduce on GH windows-latest

Here's the test code - https://github.com/amikos-tech/chrm-2513-exp/blob/main/test_import.py

With the following WF - https://github.com/amikos-tech/chrm-2513-exp/actions/runs/9919966674/workflow

Conda env with Python 3.9 and Chroma 0.5.4

I tried adding things in bulk and separately. I also intentionally have high dimensional vectors (4096).

Let me know if you encounter the error in a similar setting.

HammadB commented 3 months ago

Hmm, I wonder if this is due to a chroma-hnswlib version mismatch. Can you run pip show chroma-hnswlib? It should be 0.7.5 for chroma 0.5.4

petacube commented 3 months ago

my version of chroma-hnswlib is 0.7.3 should not the dependency like this be handled at chromadb level ?

HammadB commented 3 months ago

https://github.com/chroma-core/chroma/blob/2ae46d2dcdea1e57914dc8a3c68181840452eecb/pyproject.toml#L20

It is set here, i am not sure how you updated but maybe something went wrong. Can you upgrade the dep and try again

petacube commented 3 months ago

i did pip install --upgrade chromadb==0.5.4, so probably that does not upgrade dependencies possibly?

kaixxx commented 3 months ago

I had the same issue: Silent crash after updating to chromadb 0.5.4 on Windows EVEN WITH chroma-hnswlib vers. 0.7.5

I moved back to chromadb 0.5.0 and chroma-hnswlib 0.7.3 and everything is working like before.

tazarov commented 3 months ago

@kaixxx, can you confirm whether you were using anaconda? A user in Discord reported that the problems were resolved when he switched from anaconda to pip.

On a related note: If your environment rebuilds the chroma-hnsw lib that can be the culprit. Can you let me know what Python version and CPU Arch you have? We have prebuilt wheels for amd64 only on Windows (py39-py312).

kaixxx commented 3 months ago

Thanks for looking into this. Here is some additional info:

tazarov commented 3 months ago

@kaixxx, in your venv can you run the following code with python:

import hnswlib
import numpy as np

index = hnswlib.Index(space="l2", dim=1024)
index.init_index(max_elements=1000, ef_construction=100, M=16)
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))

Let me know if this crashes

kaixxx commented 3 months ago

Yes, it seems to crash. I've created a new environment, installed chromadb (0.5.4 with chroma-hnswlib 0.7.5). Then I've added the line print('finished') to the end of your script. This line is never reached. The script exits silently without any error message. In my other environment with chromadb 0.5.0, the script runs fine and prints 'finished'.

kaixxx commented 3 months ago

Another test: I've now downgraded to chroma-hnswlib 0.7.3 but kept chromadb 0.5.4 and your script runs fine!

tazarov commented 3 months ago

@kaixxx thanks for confirming. Can you add debug prints like this to identify whether it fails in the init of the index or when adding vectors:

import hnswlib
import numpy as np

index = hnswlib.Index(space="l2", dim=1024)
print("New index - ok")
index.init_index(max_elements=1000, ef_construction=100, M=16)
print("Init index - ok")
vectors = np.random.randn(1000, 1024).astype(np.float32)
index.add_items(vectors,ids=np.arange(1000))
print("All good")
kaixxx commented 3 months ago

Yes, output:

New index - ok
Init index - ok

(no "All good")

tazarov commented 3 months ago

@kaixxx, fantastic. Thank you for following up. 0.7.5 adds this change to add_items functionality - https://github.com/chroma-core/hnswlib/commit/408c5d1fa1dbc2acd8d1b4108191a8f803862210?diff=split&w=0#diff-ab27cbb27975c68cb0c6da824871058623f7f76a761c3c8365ef2e1395cf7cd9R1706-R1708

Can I ask you to rebuild the HNSW lib locally (if you have the necessary deps):

pip install --no-binary :all: chroma-hnswlib==0.7.5
kaixxx commented 3 months ago

Hey @tazarov, I've tried to build it but it results in an error from the linker that a certain file could not be opened. It may be that my build environment is not set up properly, but I don't have the time to dig into that. Is there anything else I can do?

dddxst commented 3 months ago

when the document's length big enough and insert the 100th , then the bug will occur, Whether you insert data one by one or all at once

atroyn commented 3 months ago

Reproduced for python 3.12 and 3.10 on our windows machine (though this does not show up in CI, we should figure out why - perhaps the number of embeddings we insert in CI is not large enough to trigger this).

@HammadB and I are looking into it.

HammadB commented 3 months ago

I have confirmed that running with --no-binary (building from source) fixes this as a workaround. This points to an issue in the wheel build. Investigating further.

HammadB commented 3 months ago

It seems the windows wheels were building with AVX/SSE enabled if the runners they were compiled on had it, I guess previously for 0.7.3 the runner just happened to not have AVX/SSE but now it does. I have pushed an alpha release 0.7.6.alpha1.

@dddxst and @kaixxx and @petacube can you pip install chroma-hnswlib==0.7.6a1 and let me know if that fixes your issue? If so, I can issue a main release. Thanks.

kaixxx commented 3 months ago

Thanks! I've tested chroma-hnswlib 0.7.6a1 with the above script and it still crashes, unfortunately. Exactly the same behavior as described in https://github.com/chroma-core/chroma/issues/2513#issuecomment-2231002576

atroyn commented 3 months ago

Have reproduced the 0.7.6a1 failure on our windows machine. The next step is to put a debugger on the cpp code itself. This will be a bit hairy but will coordinate with @HammadB to ship a fix.

EricBLivingston commented 3 months ago

I had the same problem with 0.5.5 and downgrading to 0.5.3/0.7.3 has solved it for now!

HammadB commented 3 months ago

@EricBLivingston what version of python are you on?

EricBLivingston commented 3 months ago

@EricBLivingston what version of python are you on?

Version 3.11.9

tazarov commented 3 months ago

It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149

atroyn commented 3 months ago

It would appear that the issue exists on hnswlib 0.7.3 too (Windows 10, AMD Ryzen 5) - https://discord.com/channels/1073293645303795742/1265778818422145149

@tazarov can you please post a summary of this long conversation here for easy reference? There is a lot going on and it's unclear to me what the issue is. Which python version is the user on?

tazarov commented 3 months ago

@atroyn, the use has the following config:

Window 10 AMD Ryzen 5 3600xt Python 3.12 Running in local jupyter notebook

Versions where they manage to reproduce the bug: 0.5.3 and 0.5.4

They had build chain (msvc) and tried to build from source, but encountered a build error (something related to ninja). Attaching the build failure here. chroma-hnswlib==0.7.3-build-failure.txt

atroyn commented 3 months ago

We should advise users on Windows to downgrade to python 3.10 for their Chroma environments.

Latetide commented 2 months ago

I just had a similar error. Using chroma in Jupyter Notebook, the kernel shuts down and restarts after trying to insert the 100th element.

Versions: chromadb: 0.5.5 chroma-hnswlib: 0.7.6

I tested this with this code based on the above, it works in command line python without any issues, but it crashes again when I try to run it in the notebook.

import hnswlib
import numpy as np

print("Starting")

index = hnswlib.Index(space="l2", dim=1024)
print("After index declaration")

index.init_index(max_elements=1000, ef_construction=100, M=16)
print("After Init Index")

vectors = np.random.randn(1000, 1024).astype(np.float32)
print("After Vector creation")

index.add_items(vectors,ids=np.arange(1000))

print("Done")

The kernel log only says this with Debug mode: (the first json block is the last thing I printed before the crash)

{'buffers': [],
 'content': b'{"name": "stdout", "text": "Number of existing docs in DB: 99\\nN'
            b'umber of new chunks: 1\\n"}',
 'header': {'date': datetime.datetime(2024, 8, 12, 13, 33, 23, 677442, tzinfo=tzutc()),
            'msg_id': '91678410-7c9cf6a1b73f88beb4462c14_10376_136',
            'msg_type': 'stream',
            'session': '91678410-7c9cf6a1b73f88beb4462c14',
            'username': 'username',
            'version': '5.3'},
 'metadata': {},
 'msg_id': '91678410-7c9cf6a1b73f88beb4462c14_10376_136',
 'msg_type': 'stream',
 'parent_header': {'date': datetime.datetime(2024, 8, 12, 13, 33, 23, 439000, tzinfo=tzutc()),
                   'msg_id': 'ea6cbfb9-0227-4f8e-9717-e8bc860aef91',
                   'msg_type': 'execute_request',
                   'session': '8ba47159-9da6-45dc-940c-df463dd8f2c0',
                   'username': '',
                   'version': '5.2'}}
[I 2024-08-12 14:33:27.083 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
[W 2024-08-12 14:33:27.083 ServerApp] kernel 2975483f-61dd-4ee6-bdf0-a04c8a086712 restarted

When executing the lines one by one, this is what crashes (as expected): index.add_items(vectors,ids=np.arange(1000))

atroyn commented 2 months ago

@Latetide are you also on Windows? Could you please post the results of running msinfo32 if so?

Latetide commented 2 months ago

Yes, Windows 10. Python version 3.12.4, jupyter version 7.2.1

Here is the msinfo file: msinfo.txt

atroyn commented 2 months ago

Could you downgrade to python 3.10 in your Chroma environment, reinstall, and try again?

mferris77 commented 2 months ago

FYI stumbled across this while working through a lanchain demo. :) Same experience as others above - VSCode, Jupyter NB crashes when I try to load text chunks - it loaded 10 fine, when I bumped it to 50 it crashes. The test script above (import hnswlib, numpy, etc) also ended in a crash.

Windows 11 Chroma 0.5.5 chroma-hnwslib: 0.7.6 Python 3.10.11 ipykernel 6.29.5 pytorch 2.4.0 cuda 12.4

Hope this helps! I'll roll back to a previous working version noted above.

Tony1040 commented 2 months ago

I am getting this error too on Windows 11 while using langchain chroma. These are the details of my system:

Windows 11 Ryzen 5 2400G Python 3.12 chromadb==0.5.5

Downgrading to python 3.10 seems to work/fix the issue

python --version Python 3.10.14 pip freeze | grep -i chroma chroma-hnswlib==0.7.6 chromadb==0.5.5 langchain-chroma==0.1.2

sunilswain-esspl commented 1 month ago

chroma-hnswlib

Thanks a lot. fixed the issue.

Olloxan commented 3 weeks ago

Bug Still exists with Win10 chromadb 0.5.12 chroma-hnswlib-0.7.6

wthomasu commented 2 weeks ago

I had the same problem with 0.5.13 and downgrading to 0.5.3 has solved it for now!

farzbood commented 3 days ago

As I've opened this issue #2992 , tried to trace the chroma-hnswlib to make a sense of the root cause, but seems the build doesn't contain the debug info, so I couldn't step into its code, am I right? if so, could you release a build with proper debug info?

(edit): debugging environment and tools