chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.54k stars 1.21k forks source link

Downloading a specific sentence-transformer model during docker initiation process in client-server mode #2761

Closed rmr-code closed 1 week ago

rmr-code commented 1 week ago

What happened?

I am able to use chroma in a docker but find that the sentence-transformer/all-MiniLM-L6-v2 is downloaded on demand the first time the client connects to the server. I would like to have the model downloaded during docker initiation process. In fact also be able to specify the model name

Versions

The docker hub image says: docker pull chromadb/chroma:0.5.6.dev75 The python is: 3.12 The OS is: Mac OS 14.6.1

Relevant log output

No response

tazarov commented 1 week ago

@rmr-code, do you run Chroma in docker compose or just docker run?

rmr-code commented 1 week ago

docker compose

tazarov commented 1 week ago

@rmr-code, one last question:

Does the image below about summarizes your setup:

image

Or is it more like that:

image

rmr-code commented 1 week ago

The first image defines my use-case. I was wondering whether we could use the hugging face client library to download the model and share that volume to the chromadb server service in a manner such that the it uses the already downloaded model

rmr-code commented 1 week ago

Would something like this work (I have not tried it out):

FROM chromadb/chromadb:latest

# Install huggingface_hub if not already present
RUN pip install huggingface_hub

# Pre-download the embedding model using Hugging Face's API
RUN python -c "from huggingface_hub import hf_hub_download; \
               hf_hub_download(repo_id='sentence-transformers/all-MiniLM-L6-v2', \
               filename='pytorch_model.bin', \
               cache_dir='/data/models', \
               use_auth_token='your_huggingface_token')"

# Expose port
EXPOSE 8000

# Start ChromaDB service
CMD ["chromadb"]

I am not sure which directory to use as a cache directory so that when the chromadb client calls it even the first time, the system should recognize that the model is already downloaded.

rmr-code commented 1 week ago

Or better still pass the model string (sentence-transformers/all-MiniLM-L6-v2) as an ENVIRONMENT variable to the ChromaDB image (again I did not find any documentation on the image parameters) and it will do the needful

tazarov commented 1 week ago

For the first scenario above you can do the following:

version: '3.9'

networks:
  net:
    driver: bridge

services:
  webapp:
    image: ...
    volumes:
      - ./models:/model-cache
    env:
      - SENTENCE_TRANSFORMERS_HOME: /model-cache
    networks:
      - net
  server:
    image: ghcr.io/chroma-core/chroma:0.5.5
    volumes:
      - ./chroma-data:/chroma/chroma
    command: "--workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
    environment:
      - IS_PERSISTENT=TRUE
      - CHROMA_SERVER_AUTHN_PROVIDER=${CHROMA_SERVER_AUTHN_PROVIDER}
      - CHROMA_SERVER_AUTHN_CREDENTIALS_FILE=${CHROMA_SERVER_AUTHN_CREDENTIALS_FILE}
      - CHROMA_SERVER_AUTHN_CREDENTIALS=${CHROMA_SERVER_AUTHN_CREDENTIALS}
      - CHROMA_AUTH_TOKEN_TRANSPORT_HEADER=${CHROMA_AUTH_TOKEN_TRANSPORT_HEADER}
      - PERSIST_DIRECTORY=${PERSIST_DIRECTORY:-/chroma/chroma}
      - CHROMA_OTEL_EXPORTER_ENDPOINT=${CHROMA_OTEL_EXPORTER_ENDPOINT}
      - CHROMA_OTEL_EXPORTER_HEADERS=${CHROMA_OTEL_EXPORTER_HEADERS}
      - CHROMA_OTEL_SERVICE_NAME=${CHROMA_OTEL_SERVICE_NAME}
      - CHROMA_OTEL_GRANULARITY=${CHROMA_OTEL_GRANULARITY}
      - CHROMA_SERVER_NOFILE=${CHROMA_SERVER_NOFILE}
    restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
    ports:
      - "8000:8000"
    healthcheck:
      # Adjust below to match your container port
      test: [ "CMD", "curl", "-f", "http://localhost:8000/api/v1/heartbeat" ]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - net

The mounted volume ./models:/model-cache and the env var SENTENCE_TRANSFORMERS_HOME will ensure that models are downloaded and used from persistent dir.

Alternatively if you want to download the models (e.g. if you redeploy every now and then):

version: '3.9'

networks:
  net:
    driver: bridge

services:
  model_downloader: #this will download the models specified (if they already exists then it will skip downloading)
    image: amikos/hf-model-downloader
    command: sentence-transformers/all-MiniLM-L6-v2
    volumes:
      - ./models:/models
    environment:
      - USE_CACHE=TRUE
  webapp:
    image: ...
    volumes:
      - ./models:/model-cache
    env:
      - SENTENCE_TRANSFORMERS_HOME: /model-cache
    networks:
      - net
    depends_on:
      - model_downloader
  server:
    image: ghcr.io/chroma-core/chroma:0.5.5
    volumes:
      - ./chroma-data:/chroma/chroma
    command: "--workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config chromadb/log_config.yml --timeout-keep-alive 30"
    environment:
      - IS_PERSISTENT=TRUE
      - CHROMA_SERVER_AUTHN_PROVIDER=${CHROMA_SERVER_AUTHN_PROVIDER}
      - CHROMA_SERVER_AUTHN_CREDENTIALS_FILE=${CHROMA_SERVER_AUTHN_CREDENTIALS_FILE}
      - CHROMA_SERVER_AUTHN_CREDENTIALS=${CHROMA_SERVER_AUTHN_CREDENTIALS}
      - CHROMA_AUTH_TOKEN_TRANSPORT_HEADER=${CHROMA_AUTH_TOKEN_TRANSPORT_HEADER}
      - PERSIST_DIRECTORY=${PERSIST_DIRECTORY:-/chroma/chroma}
      - CHROMA_OTEL_EXPORTER_ENDPOINT=${CHROMA_OTEL_EXPORTER_ENDPOINT}
      - CHROMA_OTEL_EXPORTER_HEADERS=${CHROMA_OTEL_EXPORTER_HEADERS}
      - CHROMA_OTEL_SERVICE_NAME=${CHROMA_OTEL_SERVICE_NAME}
      - CHROMA_OTEL_GRANULARITY=${CHROMA_OTEL_GRANULARITY}
      - CHROMA_SERVER_NOFILE=${CHROMA_SERVER_NOFILE}
    restart: unless-stopped # possible values are: "no", always", "on-failure", "unless-stopped"
    ports:
      - "8000:8000"
    healthcheck:
      # Adjust below to match your container port
      test: [ "CMD", "curl", "-f", "http://localhost:8000/api/v1/heartbeat" ]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - net

In the above I've created an image that uses huggingface CLI to download one or more models and store them as model cache that can then be mounted inside your app.

tazarov commented 1 week ago

Would something like this work (I have not tried it out):

FROM chromadb/chromadb:latest

# Install huggingface_hub if not already present
RUN pip install huggingface_hub

# Pre-download the embedding model using Hugging Face's API
RUN python -c "from huggingface_hub import hf_hub_download; \
               hf_hub_download(repo_id='sentence-transformers/all-MiniLM-L6-v2', \
               filename='pytorch_model.bin', \
               cache_dir='/data/models', \
               use_auth_token='your_huggingface_token')"

# Expose port
EXPOSE 8000

# Start ChromaDB service
CMD ["chromadb"]

I am not sure which directory to use as a cache directory so that when the chromadb client calls it even the first time, the system should recognize that the model is already downloaded.

SentenceTransformer cache can be specified in two ways:

rmr-code commented 1 week ago

Thanks @tazarov . I will try your option 1. Any reason you are using the image ghcr.io/chroma-core/chroma:0.5.5 and not chromadb/chroma? Are they from the same source?

tazarov commented 1 week ago

Thanks @tazarov . I will try your option 1. Any reason you are using the image ghcr.io/chroma-core/chroma:0.5.5 and not chromadb/chroma? Are they from the same source?

ghcr.io/chroma-core/chroma:0.5.5 is the GitHub image whereas chromadb/chroma is the Docker Hub one. They are identical.