chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.41k stars 1.29k forks source link

[Bug]: 500 internal server error when accesing the docker chroma database #732

Closed carlesonielfa closed 1 year ago

carlesonielfa commented 1 year ago

What happened?

After adding about 500k document chunks to the database, the database throws a 500 error when performing a query or adding any more documents. I am using the langchain.vectorstores.Chroma pipeline for adding the documents and querying them.

I am running the docker provided by the repo with the only difference being a max_query_size for the clickhouse server of 1000000000. Without that, I would get an error when adding documents.

Below I provide the relevant logs shown when executing docker logs chroma-server-1.

Any help fixing this would be gladly appreciated, thank you :)

Versions

Relevant log output

docker logs chroma-server-1

2023-06-27 06:22:11 INFO     uvicorn.access  172.30.0.1:44770 - "POST /api/v1/collections/c6c7e0ff-90c5-4c88-bdac-60a94d8dc666/query HTTP/1.1" 500
2023-06-27 06:23:55 INFO     chromadb.db.clickhouse collection with name langchain already exists, returning existing collection
2023-06-27 06:23:55 INFO     uvicorn.access  172.30.0.1:49372 - "POST /api/v1/collections HTTP/1.1" 200
2023-06-27 06:24:06 ERROR    chromadb.server.fastapi Ran out of input
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 98, in receive
    return self.receive_nowait()
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 93, in receive_nowait
    raise WouldBlock
anyio.WouldBlock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 43, in call_next
    message = await recv_stream.receive()
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 118, in receive
    raise EndOfStream
anyio.EndOfStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/chroma/./chromadb/server/fastapi/__init__.py", line 57, in catch_exceptions_middleware
    return await call_next(request)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 46, in call_next
    raise app_exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 36, in coro
    await self.app(scope, request.receive, send_stream.send)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 75, in __call__
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 64, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 680, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 275, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 65, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 231, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 162, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/usr/local/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/chroma/./chromadb/server/fastapi/__init__.py", line 258, in get_nearest_neighbors
    nnresult = self._api._query(
  File "/chroma/./chromadb/api/local.py", line 457, in _query
    uuids, distances = self._db.get_nearest_neighbors(
  File "/chroma/./chromadb/db/clickhouse.py", line 612, in get_nearest_neighbors
    index = self._index(collection_uuid)
  File "/chroma/./chromadb/db/clickhouse.py", line 102, in _index
    index = Hnswlib(
  File "/chroma/./chromadb/db/index/hnswlib.py", line 100, in __init__
    self._load(number_elements)
  File "/chroma/./chromadb/db/index/hnswlib.py", line 235, in _load
    self._label_to_id = pickle.load(f)
EOFError: Ran out of input
2023-06-27 06:24:06 INFO     uvicorn.access  172.30.0.1:45262 - "POST /api/v1/collections/c6c7e0ff-90c5-4c88-bdac-60a94d8dc666/query HTTP/1.1" 500
perzeuss commented 1 year ago

I get the same error when querying the database after adding 8.000 documents and restarting chroma-server and clickhouse.

jeffchuber commented 1 year ago

hi @carlesonielfa @perzeuss sorry to hear this! can you tell if your docker image or server is running out of hard drive space or memory?

we are releasing a new version of chroma next wednesday that will replace clickhouse, hopefully making sharp edges like this much harder to hit and easier to debug.

wmbutler commented 1 year ago

It's too bad you are replacing clickhouse. I like it as a db server. Nonetheless, we ran into this issue when sending a high volume of documents. Looks to me like it has to do with Chroma creating the same collection name twice under high load situations. Also, the absence of null in the collection metadata column can cause this.

I suggested a unique key on the collections.name column.

https://github.com/chroma-core/chroma/issues/773

perzeuss commented 1 year ago

Hi @jeffchuber, I was able to reproduce this error several times with chroma version 0.3.x and with more than enough resources (hard drive space and memory).

I just migrated to the new chroma release 0.4.0, and I can no longer reproduce this error!

jeffchuber commented 1 year ago

Closing this as it is stale. Please let me know if anything else pops up here and we can re-open it.

@perzeuss glad this cleaned up your issue! (a few months ago :) )

aeidme commented 1 year ago

Hello, I'm receiving the exact error after sending high volumes of data to chromadb on Docker with a persistent volume.

I'm using the latest version of chromadb (from releases) so @perzeuss solution didn't really help me.

@wmbutler can you please guide us on how to check for null values if the database is 4gb. client.list_collections() returns 1 collection.

@jeffchuber can you please re-open the issue?

Thanks a lot for your help.

HammadB commented 1 year ago

Can you please share server logs? A 500 error isn't really descriptive enough to warrant an issue or allow for debugging. Thanks

aeidme commented 1 year ago

Can you please share server logs? A 500 error isn't really descriptive enough to warrant an issue or allow for debugging. Thanks

Sure here are the logs:

ERROR:    [23-09-2023 01:46:49] Ran out of input
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 97, in receive
    return self.receive_nowait()
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 92, in receive_nowait
    raise WouldBlock
anyio.WouldBlock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 78, in call_next
    message = await recv_stream.receive()
  File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 112, in receive
    raise EndOfStream
anyio.EndOfStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/chroma/./chromadb/server/fastapi/__init__.py", line 58, in catch_exceptions_middleware
    return await call_next(request)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
    raise app_exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 241, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 169, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/usr/local/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 833, in run
    result = context.run(func, *args)
  File "/chroma/./chromadb/server/fastapi/__init__.py", line 251, in add
    result = self._api._add(
  File "/chroma/./chromadb/api/segment.py", line 245, in _add
    self._manager.hint_use_collection(collection_id, t.Operation.ADD)
  File "/chroma/./chromadb/segment/impl/manager/local.py", line 153, in hint_use_collection
    instance = self.get_segment(collection_id, type)
  File "/chroma/./chromadb/segment/impl/manager/local.py", line 144, in get_segment
    instance = self._instance(self._segment_cache[collection_id][scope])
  File "/chroma/./chromadb/segment/impl/manager/local.py", line 169, in _instance
    instance = cls(self._system, segment)
  File "/chroma/./chromadb/segment/impl/vector/local_persistent_hnsw.py", line 98, in __init__
    self._persist_data = PersistentData.load_from_file(
  File "/chroma/./chromadb/segment/impl/vector/local_persistent_hnsw.py", line 65, in load_from_file
    ret = cast(PersistentData, pickle.load(f))
EOFError: Ran out of input
INFO:     [23-09-2023 01:46:49] X.X.X.X:40034 - "POST /api/v1/collections/1ba6c9e8-1ec1-4c0e-a7dc-9961cf1cef42/add HTTP/1.1" 500

After debugging the code, it seems that it's related to the file index_metadata.pickle.

I tried removing it and instead of returning an error, the db returned an empty response as such: {'ids': [[]], 'distances': [[]], 'embeddings': None, 'metadatas': [[]], 'documents': [[]]}

aeidme commented 1 year ago

Can anyone help please? This issue happened again and every time it happens after adding some data to the database. Now I need to remove the index_metadata.pickle file and hope that the chromadb will fill it correctly again. It's taking so much time since the data is more than 5gb.

@jeffchuber can you please re-open the issue?

lauradang commented 1 year ago

+1 experiencing the same issue