chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.9k stars 1.25k forks source link

[Bug]: tarfile.ReadError: empty file #1435

Open suckrowPierre opened 10 months ago

suckrowPierre commented 10 months ago

What happened?

I encountered a ReadError related to the tarfile module when trying to use ChromaDB on my Mac with an M1 chip. The error occurs during the model file extraction process in the _download_model_if_not_exists method within embedding_functions.py. It appears that the tar file being accessed is empty or corrupted, leading to a failure in setting up the ChromaDB environment.

Versions

Chroma 0.4.18, Python 3.11

Relevant log output

Traceback (most recent call last):
  File "/Volumes/NEW DRIVE PIERRE/CodingNEW/ChromaDBFirst/db.py", line 9, in <module>
    collection.add(
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 147, in add
    embeddings = self._embed(input=documents)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 587, in _embed
    return self._embedding_function(input=input)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/site-packages/chromadb/utils/embedding_functions.py", line 487, in __call__
    self._download_model_if_not_exists()
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/site-packages/chromadb/utils/embedding_functions.py", line 517, in _download_model_if_not_exists
    with tarfile.open(
         ^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/tarfile.py", line 1824, in open
    return func(name, filemode, fileobj, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/tarfile.py", line 1877, in gzopen
    t = cls.taropen(name, mode, fileobj, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/tarfile.py", line 1854, in taropen
    return cls(name, mode, fileobj, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/tarfile.py", line 1714, in __init__
    self.firstmember = self.next()
                       ^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/envs/ChromaDB2_env/lib/python3.11/tarfile.py", line 2619, in next
    raise ReadError("empty file") from None
tarfile.ReadError: empty file
tazarov commented 10 months ago

@suckrowPierre, thanks for reporting this. The download is of the onnx runtime with all-MiniLM model from Chroma's s3. I've just checked and the URL should be accessible - https://chroma-onnx-models.s3.amazonaws.com/all-MiniLM-L6-v2/onnx.tar.gz.

Do you have any firewall or antivirus that may be causing the file to be corrupted?. Can you try and use the above URL to download the file manually?