Closed rmr-code closed 1 week ago
@rmr-code, do you run Chroma in docker compose or just docker run
?
docker compose
@rmr-code, one last question:
Does the image below about summarizes your setup:
Or is it more like that:
The first image defines my use-case. I was wondering whether we could use the hugging face client library to download the model and share that volume to the chromadb server service in a manner such that the it uses the already downloaded model
Would something like this work (I have not tried it out):
FROM chromadb/chromadb:latest
# Install huggingface_hub if not already present
RUN pip install huggingface_hub
# Pre-download the embedding model using Hugging Face's API
RUN python -c "from huggingface_hub import hf_hub_download; \
hf_hub_download(repo_id='sentence-transformers/all-MiniLM-L6-v2', \
filename='pytorch_model.bin', \
cache_dir='/data/models', \
use_auth_token='your_huggingface_token')"
# Expose port
EXPOSE 8000
# Start ChromaDB service
CMD ["chromadb"]
I am not sure which directory to use as a cache directory so that when the chromadb client calls it even the first time, the system should recognize that the model is already downloaded.
Or better still pass the model string (sentence-transformers/all-MiniLM-L6-v2) as an ENVIRONMENT variable to the ChromaDB image (again I did not find any documentation on the image parameters) and it will do the needful
For the first scenario above you can do the following:
version: '3.9'
networks:
net:
driver: bridge
services:
webapp:
image: ...
volumes:
- ./models:/model-cache
env:
- SENTENCE_TRANSFORMERS_HOME: /model-cache
networks:
- net
server:
image: ghcr.io/chroma-core/chroma:0.5.5
volumes:
- ./chroma-data:/chroma/chroma
command: "--workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
environment:
- IS_PERSISTENT=TRUE
- CHROMA_SERVER_AUTHN_PROVIDER=${CHROMA_SERVER_AUTHN_PROVIDER}
- CHROMA_SERVER_AUTHN_CREDENTIALS_FILE=${CHROMA_SERVER_AUTHN_CREDENTIALS_FILE}
- CHROMA_SERVER_AUTHN_CREDENTIALS=${CHROMA_SERVER_AUTHN_CREDENTIALS}
- CHROMA_AUTH_TOKEN_TRANSPORT_HEADER=${CHROMA_AUTH_TOKEN_TRANSPORT_HEADER}
- PERSIST_DIRECTORY=${PERSIST_DIRECTORY:-/chroma/chroma}
- CHROMA_OTEL_EXPORTER_ENDPOINT=${CHROMA_OTEL_EXPORTER_ENDPOINT}
- CHROMA_OTEL_EXPORTER_HEADERS=${CHROMA_OTEL_EXPORTER_HEADERS}
- CHROMA_OTEL_SERVICE_NAME=${CHROMA_OTEL_SERVICE_NAME}
- CHROMA_OTEL_GRANULARITY=${CHROMA_OTEL_GRANULARITY}
- CHROMA_SERVER_NOFILE=${CHROMA_SERVER_NOFILE}
restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
ports:
- "8000:8000"
healthcheck:
# Adjust below to match your container port
test: [ "CMD", "curl", "-f", "http://localhost:8000/api/v1/heartbeat" ]
interval: 30s
timeout: 10s
retries: 3
networks:
- net
The mounted volume ./models:/model-cache
and the env var SENTENCE_TRANSFORMERS_HOME
will ensure that models are downloaded and used from persistent dir.
Alternatively if you want to download the models (e.g. if you redeploy every now and then):
version: '3.9'
networks:
net:
driver: bridge
services:
model_downloader: #this will download the models specified (if they already exists then it will skip downloading)
image: amikos/hf-model-downloader
command: sentence-transformers/all-MiniLM-L6-v2
volumes:
- ./models:/models
environment:
- USE_CACHE=TRUE
webapp:
image: ...
volumes:
- ./models:/model-cache
env:
- SENTENCE_TRANSFORMERS_HOME: /model-cache
networks:
- net
depends_on:
- model_downloader
server:
image: ghcr.io/chroma-core/chroma:0.5.5
volumes:
- ./chroma-data:/chroma/chroma
command: "--workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
environment:
- IS_PERSISTENT=TRUE
- CHROMA_SERVER_AUTHN_PROVIDER=${CHROMA_SERVER_AUTHN_PROVIDER}
- CHROMA_SERVER_AUTHN_CREDENTIALS_FILE=${CHROMA_SERVER_AUTHN_CREDENTIALS_FILE}
- CHROMA_SERVER_AUTHN_CREDENTIALS=${CHROMA_SERVER_AUTHN_CREDENTIALS}
- CHROMA_AUTH_TOKEN_TRANSPORT_HEADER=${CHROMA_AUTH_TOKEN_TRANSPORT_HEADER}
- PERSIST_DIRECTORY=${PERSIST_DIRECTORY:-/chroma/chroma}
- CHROMA_OTEL_EXPORTER_ENDPOINT=${CHROMA_OTEL_EXPORTER_ENDPOINT}
- CHROMA_OTEL_EXPORTER_HEADERS=${CHROMA_OTEL_EXPORTER_HEADERS}
- CHROMA_OTEL_SERVICE_NAME=${CHROMA_OTEL_SERVICE_NAME}
- CHROMA_OTEL_GRANULARITY=${CHROMA_OTEL_GRANULARITY}
- CHROMA_SERVER_NOFILE=${CHROMA_SERVER_NOFILE}
restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
ports:
- "8000:8000"
healthcheck:
# Adjust below to match your container port
test: [ "CMD", "curl", "-f", "http://localhost:8000/api/v1/heartbeat" ]
interval: 30s
timeout: 10s
retries: 3
networks:
- net
In the above I've created an image that uses huggingface CLI to download one or more models and store them as model cache that can then be mounted inside your app.
Would something like this work (I have not tried it out):
FROM chromadb/chromadb:latest # Install huggingface_hub if not already present RUN pip install huggingface_hub # Pre-download the embedding model using Hugging Face's API RUN python -c "from huggingface_hub import hf_hub_download; \ hf_hub_download(repo_id='sentence-transformers/all-MiniLM-L6-v2', \ filename='pytorch_model.bin', \ cache_dir='/data/models', \ use_auth_token='your_huggingface_token')" # Expose port EXPOSE 8000 # Start ChromaDB service CMD ["chromadb"]
I am not sure which directory to use as a cache directory so that when the chromadb client calls it even the first time, the system should recognize that the model is already downloaded.
SentenceTransformer cache can be specified in two ways:
SENTENCE_TRANSFORMERS_HOME=/path/to/your/cache
ef= SentenceTransformerEmbeddingFunction(model_name="sentence-transformers/all-MiniLM-L6-v2",cache_folder="/path/to/your/cache")
Thanks @tazarov . I will try your option 1. Any reason you are using the image ghcr.io/chroma-core/chroma:0.5.5
and not chromadb/chroma
? Are they from the same source?
Thanks @tazarov . I will try your option 1. Any reason you are using the image
ghcr.io/chroma-core/chroma:0.5.5
and notchromadb/chroma
? Are they from the same source?
ghcr.io/chroma-core/chroma:0.5.5
is the GitHub image whereas chromadb/chroma
is the Docker Hub one. They are identical.
What happened?
I am able to use chroma in a docker but find that the sentence-transformer/all-MiniLM-L6-v2 is downloaded on demand the first time the client connects to the server. I would like to have the model downloaded during docker initiation process. In fact also be able to specify the model name
Versions
The docker hub image says: docker pull chromadb/chroma:0.5.6.dev75 The python is: 3.12 The OS is: Mac OS 14.6.1
Relevant log output
No response