Closed carlesonielfa closed 1 year ago
I get the same error when querying the database after adding 8.000 documents and restarting chroma-server and clickhouse.
hi @carlesonielfa @perzeuss sorry to hear this! can you tell if your docker image or server is running out of hard drive space or memory?
we are releasing a new version of chroma
next wednesday that will replace clickhouse, hopefully making sharp edges like this much harder to hit and easier to debug.
It's too bad you are replacing clickhouse. I like it as a db server. Nonetheless, we ran into this issue when sending a high volume of documents. Looks to me like it has to do with Chroma creating the same collection name twice under high load situations. Also, the absence of null
in the collection metadata column can cause this.
I suggested a unique key on the collections.name
column.
Hi @jeffchuber, I was able to reproduce this error several times with chroma version 0.3.x and with more than enough resources (hard drive space and memory).
I just migrated to the new chroma release 0.4.0, and I can no longer reproduce this error!
Closing this as it is stale. Please let me know if anything else pops up here and we can re-open it.
@perzeuss glad this cleaned up your issue! (a few months ago :) )
Hello, I'm receiving the exact error after sending high volumes of data to chromadb on Docker with a persistent volume.
I'm using the latest version of chromadb (from releases) so @perzeuss solution didn't really help me.
@wmbutler can you please guide us on how to check for null values if the database is 4gb.
client.list_collections()
returns 1 collection.
@jeffchuber can you please re-open the issue?
Thanks a lot for your help.
Can you please share server logs? A 500 error isn't really descriptive enough to warrant an issue or allow for debugging. Thanks
Can you please share server logs? A 500 error isn't really descriptive enough to warrant an issue or allow for debugging. Thanks
Sure here are the logs:
ERROR: [23-09-2023 01:46:49] Ran out of input
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 97, in receive
return self.receive_nowait()
File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 92, in receive_nowait
raise WouldBlock
anyio.WouldBlock
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 78, in call_next
message = await recv_stream.receive()
File "/usr/local/lib/python3.10/site-packages/anyio/streams/memory.py", line 112, in receive
raise EndOfStream
anyio.EndOfStream
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/chroma/./chromadb/server/fastapi/__init__.py", line 58, in catch_exceptions_middleware
return await call_next(request)
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
raise app_exc
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
raise e
File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 241, in app
raw_response = await run_endpoint_function(
File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 169, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/usr/local/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 833, in run
result = context.run(func, *args)
File "/chroma/./chromadb/server/fastapi/__init__.py", line 251, in add
result = self._api._add(
File "/chroma/./chromadb/api/segment.py", line 245, in _add
self._manager.hint_use_collection(collection_id, t.Operation.ADD)
File "/chroma/./chromadb/segment/impl/manager/local.py", line 153, in hint_use_collection
instance = self.get_segment(collection_id, type)
File "/chroma/./chromadb/segment/impl/manager/local.py", line 144, in get_segment
instance = self._instance(self._segment_cache[collection_id][scope])
File "/chroma/./chromadb/segment/impl/manager/local.py", line 169, in _instance
instance = cls(self._system, segment)
File "/chroma/./chromadb/segment/impl/vector/local_persistent_hnsw.py", line 98, in __init__
self._persist_data = PersistentData.load_from_file(
File "/chroma/./chromadb/segment/impl/vector/local_persistent_hnsw.py", line 65, in load_from_file
ret = cast(PersistentData, pickle.load(f))
EOFError: Ran out of input
INFO: [23-09-2023 01:46:49] X.X.X.X:40034 - "POST /api/v1/collections/1ba6c9e8-1ec1-4c0e-a7dc-9961cf1cef42/add HTTP/1.1" 500
After debugging the code, it seems that it's related to the file index_metadata.pickle
.
I tried removing it and instead of returning an error, the db returned an empty response as such:
{'ids': [[]], 'distances': [[]], 'embeddings': None, 'metadatas': [[]], 'documents': [[]]}
Can anyone help please? This issue happened again and every time it happens after adding some data to the database. Now I need to remove the index_metadata.pickle file and hope that the chromadb will fill it correctly again. It's taking so much time since the data is more than 5gb.
@jeffchuber can you please re-open the issue?
+1 experiencing the same issue
What happened?
After adding about 500k document chunks to the database, the database throws a 500 error when performing a query or adding any more documents. I am using the
langchain.vectorstores.Chroma
pipeline for adding the documents and querying them.I am running the docker provided by the repo with the only difference being a max_query_size for the clickhouse server of 1000000000. Without that, I would get an error when adding documents.
Below I provide the relevant logs shown when executing
docker logs chroma-server-1
.Any help fixing this would be gladly appreciated, thank you :)
Versions
Relevant log output
docker logs chroma-server-1