chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.46k stars 1.3k forks source link

[Feature Request]: provide deployment for kubernetes #475

Open warjiang opened 1 year ago

warjiang commented 1 year ago

Describe the problem

chroma only provide docker for self-hosted deployment,but we use kubernetes in production environment. Is there any plan to support deployment on kurbernets, may be use k8s raw manifest or helm chart.

Describe the proposed solution

provide deployment for k8s, may be i can offer some help.

Alternatives considered

No response

Importance

would make my life easier

Additional Information

No response

jeffchuber commented 1 year ago

@warjiang we are planning on this in conjunction with our refactor in the coming months.

warjiang commented 1 year ago

@warjiang we are planning on this in conjunction with our refactor in the coming months.

Looking forward for this work, if any process please let me known πŸ™. I'm willing to deploy chroma on my k8s cluster, make feedback to the develop team πŸ‘€. If need any help, please let me known, it's my pleasure to make any contribution to chroma.

l4b4r4b4b4 commented 1 year ago

Yep. Same same! As soon as there is a playbook or Helm chart I will try to deploy the ChromaDB-backed apps on RKE2 / Rancher. If things move quickly, Ill be tackling the task myself and do a PR.

@warjiang we are planning on this in conjunction with our refactor in the coming months.

Looking forward for this work, if any process please let me known pray. I'm willing to deploy chroma on my k8s cluster, make feedback to the develop team eyes. If need any help, please let me known, it's my pleasure to make any contribution to chroma.

Oliniusz commented 1 year ago

for now I'm using the following Deployment, based on https://github.com/chroma-core/chroma/blob/0.3.25/docker-compose.server.example.yml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: chroma
  labels:
    app: chroma
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 10%
  selector:
    matchLabels:
      app: chroma
  template:
    metadata:
      labels:
        app: chroma
    spec:
      containers:
        - name: clickhouse
          image: clickhouse/clickhouse-server:22.9-alpine
          ports:
            - containerPort: 8123
              name: clickhouse-http
              protocol: TCP
            - containerPort: 9000
              name: clickhouse-tcp
              protocol: TCP
          resources:
            requests:
              memory: 256Mi
              cpu: 256m
            limits:
              memory: 2Gi
              cpu: 2
          readinessProbe:
            httpGet:
              path: /
              port: 8123
            initialDelaySeconds: 5
            periodSeconds: 5
            successThreshold: 1
          env:
            - name: ALLOW_EMPTY_PASSWORD
              value: "yes"
            - name: CLICKHOUSE_TCP_PORT
              value: "9000"
            - name: CLICKHOUSE_HTTP_PORT
              value: "8123"
            - name: CLICKHOUSE_DO_NOT_CHOWN
              value: "1"
          volumeMounts:
            - name: chroma-clickhouse-data
              mountPath: /var/lib/clickhouse
            - name: chroma-clickhouse-logs
              mountPath: /var/log/clickhouse-server
            - name: chroma-clickhouse-backups-storage
              mountPath: /backups
            - name: chroma-clickhouse-backups-configmap
              mountPath: /etc/clickhouse-server/config.d/backup_disk.xml
              subPath: backup_disk.xml
            - name: chroma-clickhouse-users
              mountPath: /etc/clickhouse-server/users.d/chroma.xml
              subPath: chroma_users.xml
        - name: server
          image: ghcr.io/chroma-core/chroma:0.3.25
          ports:
            - containerPort: 8000
              name: main
              protocol: TCP
          resources:
            requests:
              memory: 256Mi
              cpu: 256m
            limits:
              memory: 2Gi
              cpu: 2
          readinessProbe:
            httpGet:
              path: /api/v1/heartbeat
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 5
            successThreshold: 1
          env:
            - name: CHROMA_DB_IMPL
              value: "clickhouse"
            - name: CLICKHOUSE_HOST
              value: "localhost"
            - name: CLICKHOUSE_PORT
              value: "8123"
          volumeMounts:
            - name: chroma-server-index
              mountPath: /chroma/.chroma/index
      serviceAccountName: chroma
      volumes:
        - name: chroma-clickhouse-data
          persistentVolumeClaim:
            claimName: chroma-clickhouse-data
        - name: chroma-clickhouse-logs
          persistentVolumeClaim:
            claimName: chroma-clickhouse-logs
        - name: chroma-clickhouse-backups-storage
          persistentVolumeClaim:
            claimName: chroma-clickhouse-backups
        - name: chroma-clickhouse-backups-configmap
          configMap:
            name: chroma-clickhouse-backups
        - name: chroma-clickhouse-users
          configMap:
            name: chroma-clickhouse-users
        - name: chroma-server-index
          persistentVolumeClaim:
            claimName: chroma-server-index

The weird thing is that the container is trying and failing to rebuild hnswlib:

Rebuilding hnsw to ensure architecture compatibility
Collecting hnswlib
  Downloading hnswlib-0.7.0.tar.gz (33 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting numpy
  Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 374.4 MB/s eta 0:00:00
Building wheels for collected packages: hnswlib
  Building wheel for hnswlib (pyproject.toml): started
  Building wheel for hnswlib (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error

  Γ— Building wheel for hnswlib (pyproject.toml) did not run successfully.
  β”‚ exit code: 1
  ╰─> [55 lines of output]
      running bdist_wheel
      running build
      running build_ext
      creating tmp
      gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c /tmp/tmp2mywuvir.cpp -o tmp/tmp2mywuvir.o -std=c++14
      gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c /tmp/tmpu3p5lj57.cpp -o tmp/tmpu3p5lj57.o -std=c++11
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 416, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 116, in <module>
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "<string>", line 103, in build_extensions
        File "<string>", line 70, in cpp_flag
      RuntimeError: Unsupported compiler -- at least C++11 support is needed!
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for hnswlib
Failed to build hnswlib
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: pip install --upgrade pip
2023-05-26 13:00:07 INFO     chromadb.telemetry.posthog Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2023-05-26 13:00:07 INFO     uvicorn.error   Started server process [32]
2023-05-26 13:00:07 INFO     uvicorn.error   Waiting for application startup.
2023-05-26 13:00:07 INFO     uvicorn.error   Application startup complete.
2023-05-26 13:00:07 INFO     uvicorn.error   Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2023-05-26 13:00:11 INFO     uvicorn.access  127.0.0.6:55909 - "GET /api/v1/heartbeat HTTP/1.1" 200
2023-05-26 13:00:16 INFO     uvicorn.access  127.0.0.6:34235 - "GET /api/v1/heartbeat HTTP/1.1" 200
Oliniusz commented 1 year ago

In my specific case, I had to change spec.strategy from RollingUpdate to Recreate due to the clickhouse container locking the files on the persistent AWS EFS storage:

2023.05.26 12:58:32.223680 [ 1 ] {} <Warning> Context: Effective user of the process (clickhouse) does not match the owner of the data (50001).
2023.05.26 12:58:32.230741 [ 1 ] {} <Error> Application: DB::Exception: Cannot lock file /var/lib/clickhouse/status. Another server instance in same directory is already running.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: chroma
  labels:
    app: chroma
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: chroma
  template:
    metadata:
      labels:
        app: chroma
    spec:
      containers:
        - name: clickhouse
          image: clickhouse/clickhouse-server:22.9-alpine
          ports:
            - containerPort: 8123
              name: clickhouse-http
              protocol: TCP
            - containerPort: 9000
              name: clickhouse-tcp
              protocol: TCP
          resources:
            requests:
              memory: 256Mi
              cpu: 256m
            limits:
              memory: 2Gi
              cpu: 2
          readinessProbe:
            httpGet:
              path: /
              port: 8123
            initialDelaySeconds: 5
            periodSeconds: 5
            successThreshold: 1
          env:
            - name: ALLOW_EMPTY_PASSWORD
              value: "yes"
            - name: CLICKHOUSE_TCP_PORT
              value: "9000"
            - name: CLICKHOUSE_HTTP_PORT
              value: "8123"
            - name: CLICKHOUSE_DO_NOT_CHOWN
              value: "1"
          volumeMounts:
            - name: chroma-clickhouse-data
              mountPath: /var/lib/clickhouse
            - name: chroma-clickhouse-logs
              mountPath: /var/log/clickhouse-server
            - name: chroma-clickhouse-backups-storage
              mountPath: /backups
            - name: chroma-clickhouse-backups-configmap
              mountPath: /etc/clickhouse-server/config.d/backup_disk.xml
              subPath: backup_disk.xml
            - name: chroma-clickhouse-users
              mountPath: /etc/clickhouse-server/users.d/chroma.xml
              subPath: chroma_users.xml
        - name: server
          image: ghcr.io/chroma-core/chroma:0.3.25
          ports:
            - containerPort: 8000
              name: main
              protocol: TCP
          resources:
            requests:
              memory: 256Mi
              cpu: 256m
            limits:
              memory: 2Gi
              cpu: 2
          readinessProbe:
            httpGet:
              path: /api/v1/heartbeat
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 5
            successThreshold: 1
          env:
            - name: CHROMA_DB_IMPL
              value: "clickhouse"
            - name: CLICKHOUSE_HOST
              value: "localhost"
            - name: CLICKHOUSE_PORT
              value: "8123"
          volumeMounts:
            - name: chroma-server-index
              mountPath: /chroma/.chroma/index
      serviceAccountName: chroma
      volumes:
        - name: chroma-clickhouse-data
          persistentVolumeClaim:
            claimName: chroma-clickhouse-data
        - name: chroma-clickhouse-logs
          persistentVolumeClaim:
            claimName: chroma-clickhouse-logs
        - name: chroma-clickhouse-backups-storage
          persistentVolumeClaim:
            claimName: chroma-clickhouse-backups
        - name: chroma-clickhouse-backups-configmap
          configMap:
            name: chroma-clickhouse-backups
        - name: chroma-clickhouse-users
          configMap:
            name: chroma-clickhouse-users
        - name: chroma-server-index
          persistentVolumeClaim:
            claimName: chroma-server-index

I'm looking forward for the project being fully HA.

jeffchuber commented 1 year ago

We should have a bunch of updates here in the next month, stay tuned!

Oliniusz commented 1 year ago

Thank you, I'm looking forward to them.

Currently my main issues I'm trying to resolve are:

I'd be grateful for any suggestions and workarounds in the meantime.

jeffchuber commented 1 year ago

@Oliniusz just as a heads up - we will be moving completely off of clickhouse. we will provide a seamless migration experience when that happens.

huineng commented 1 year ago

any news on this ?

zieen commented 1 year ago

We should have a bunch of updates here in the next month, stay tuned!

any updates?

avinashgupta2 commented 1 year ago

any update for Kubernetes deployment

jeffchuber commented 1 year ago

Hi everyone, we are releasing a single-node refactor tomorrow, then the next step will the distributed version. Getting closer!

zieen commented 1 year ago

Looking forward to the helm installation~

sfc-gh-ygupta commented 10 months ago

@jeffchuber , any update on k8s deployment?

sfc-gh-ygupta commented 10 months ago

@jeffchuber , any update on k8s deployment?

@jeffchuber , Please provide some updates.