[Bug]: Very slow response on embeddings collection in chroma 0.4

slavag commented 11 months ago

What happened?

Hi, I have a test embeddings collection made from Gutenberg library (180 of text files, made by INSTRUCTOR_Transformer, that produced 5.9GB chroma db). When I'm running it on Linux with SSD disk , 24GB GPU NVidia V10, with 8 cores CPU. the response is very slow, take more than 30 seconds to respond. Did kill -s SIGUSR1 on the python process and got :

Thread 0x00007fbd8cbfd700 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/concurrent/futures/thread.py", line 81 in _worker
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 975 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007fbda03b6700 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 163 in _records
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/chromadb/segment/impl/metadata/sqlite.py", line 148 in get_metadata
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/chromadb/api/segment.py", line 446 in _query
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 220 in query
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 156 in __query_collection
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/utils/utils.py", line 30 in wrapper
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 323 in similarity_search_with_score
  File "/mnt/AI/h2ogpt/src/gpt_langchain.py", line 3521 in get_docs_with_score
  File "/mnt/AI/h2ogpt/src/gpt_langchain.py", line 3900 in get_chain
  File "/mnt/AI/h2ogpt/src/gpt_langchain.py", line 3367 in _run_qa_db
  File "/mnt/AI/h2ogpt/src/gen.py", line 2301 in evaluate
  File "/mnt/AI/h2ogpt/src/gradio_runner.py", line 2939 in get_response
  File "/mnt/AI/h2ogpt/src/gradio_runner.py", line 2990 in bot
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/gradio/utils.py", line 695 in gen_wrapper
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/gradio/utils.py", line 326 in run_sync_iterator_async
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007fbf15bcb700 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/asyncio/runners.py", line 118 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/asyncio/runners.py", line 190 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/uvicorn/server.py", line 61 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 975 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007fbf151ca700 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 324 in wait
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 622 in wait
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/apscheduler/schedulers/blocking.py", line 30 in _main_loop
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 975 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007fbf147c9700 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/selectors.py", line 415 in select
  File "/mnt/miniconda3/envs/python311/lib/python3.11/multiprocessing/connection.py", line 930 in wait
  File "/mnt/miniconda3/envs/python311/lib/python3.11/concurrent/futures/process.py", line 412 in wait_result_broken_or_wakeup
  File "/mnt/miniconda3/envs/python311/lib/python3.11/concurrent/futures/process.py", line 339 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007fbe111f8700 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 324 in wait
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 622 in wait
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap

Thread 0x00007fbe1349c700 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 324 in wait
  File "/mnt/miniconda3/envs/python311/lib/python3.11/queue.py", line 180 in get
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/posthog/consumer.py", line 104 in next
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/posthog/consumer.py", line 73 in upload
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/posthog/consumer.py", line 62 in run
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/mnt/miniconda3/envs/python311/lib/python3.11/threading.py", line 995 in _bootstrap

Current thread 0x00007fbf218e8740 (most recent call first):
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/gradio/blocks.py", line 2202 in block_thread
  File "/mnt/AI/h2ogpt/src/gradio_runner.py", line 4090 in go_gradio
  File "/mnt/AI/h2ogpt/src/gen.py", line 1226 in main
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/fire/core.py", line 691 in _CallAndUpdateTrace
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/fire/core.py", line 475 in _Fire
  File "/mnt/miniconda3/envs/python311/lib/python3.11/site-packages/fire/core.py", line 141 in Fire
  File "/mnt/AI/h2ogpt/src/utils.py", line 59 in H2O_Fire
  File "/mnt/AI/h2ogpt/generate.py", line 12 in entrypoint_main
  File "/mnt/AI/h2ogpt/generate.py", line 16 in <module>

Please advise if that slow response from chroma is expected or something is not working properly. This also described in that issue (h2ogpt project) : https://github.com/h2oai/h2ogpt/issues/870

I can share the DB with you, as it a quite big need find a place how I can share with you. Thanks

Versions

Chroma 0.4.10. Amazon Linux 2023, Python 3.11.5

Relevant log output

No response

pseudotensor commented 11 months ago

I see same thing. duckdb doing a filter for a document was very fast, even a 20GB database would only take 30s first time and <1s each further query.

However, sqlite takes huge amount of time every time. It makes sqlite way useless unless have trivial number of documents.

E.g. for 9GB database, sqlite document query using .get() takes 2 minutes each and every time. duckdb wasn't like this.

One operation that takes a very long time is db.get() in langchain. Why would sqlite way be so horrible with same code that duckdb was so fast with?

pseudotensor commented 11 months ago

While old duckdb chroma is fast for db.get() by massive amount than new chroma...

Even old chroma with duckdb is very slow if any filter is required. It also leads to excessive memory use when filtering. Getting 1000 similar documents on a 5GB database with 711730 entries takes more than 64GB memory.

I don't know why filtering is so slow and uses so much memory.

pseudotensor commented 11 months ago

Indeed, for duckdb chroma, if I do db.get() on a 5GB database (takes about 3 seconds) and then filter that entire collection using list comprehension (instant for 700k entries), it's vastly faster than chroma's native filtering that takes 60 seconds and then runs out of memory.

I would think the most efficient way to filter is to filter along the way of doing sim search. i.e. Do normal sim search, and if document doesn't satisfy filter, reject it. This would be no slower than sim search without filter and use no more memory for sure.

So whatever chroma is doing must be much worse.

pseudotensor commented 11 months ago

Can close, h2oGPT found work-arounds.

slavag commented 11 months ago

Work around is good, but I think Chroma should solve such issues.

khajavi commented 11 months ago

I have the same problem with a db of about 1GB in size. The latency is about 9 seconds.

chroma-core / chroma