Open warjiang opened 1 year ago
@warjiang we are planning on this in conjunction with our refactor in the coming months.
@warjiang we are planning on this in conjunction with our refactor in the coming months.
Looking forward for this work, if any process please let me known π. I'm willing to deploy chroma on my k8s cluster, make feedback to the develop team π. If need any help, please let me known, it's my pleasure to make any contribution to chroma.
Yep. Same same! As soon as there is a playbook or Helm chart I will try to deploy the ChromaDB-backed apps on RKE2 / Rancher. If things move quickly, Ill be tackling the task myself and do a PR.
@warjiang we are planning on this in conjunction with our refactor in the coming months.
Looking forward for this work, if any process please let me known pray. I'm willing to deploy chroma on my k8s cluster, make feedback to the develop team eyes. If need any help, please let me known, it's my pleasure to make any contribution to chroma.
for now I'm using the following Deployment
, based on https://github.com/chroma-core/chroma/blob/0.3.25/docker-compose.server.example.yml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: chroma
labels:
app: chroma
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 100%
maxUnavailable: 10%
selector:
matchLabels:
app: chroma
template:
metadata:
labels:
app: chroma
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:22.9-alpine
ports:
- containerPort: 8123
name: clickhouse-http
protocol: TCP
- containerPort: 9000
name: clickhouse-tcp
protocol: TCP
resources:
requests:
memory: 256Mi
cpu: 256m
limits:
memory: 2Gi
cpu: 2
readinessProbe:
httpGet:
path: /
port: 8123
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
env:
- name: ALLOW_EMPTY_PASSWORD
value: "yes"
- name: CLICKHOUSE_TCP_PORT
value: "9000"
- name: CLICKHOUSE_HTTP_PORT
value: "8123"
- name: CLICKHOUSE_DO_NOT_CHOWN
value: "1"
volumeMounts:
- name: chroma-clickhouse-data
mountPath: /var/lib/clickhouse
- name: chroma-clickhouse-logs
mountPath: /var/log/clickhouse-server
- name: chroma-clickhouse-backups-storage
mountPath: /backups
- name: chroma-clickhouse-backups-configmap
mountPath: /etc/clickhouse-server/config.d/backup_disk.xml
subPath: backup_disk.xml
- name: chroma-clickhouse-users
mountPath: /etc/clickhouse-server/users.d/chroma.xml
subPath: chroma_users.xml
- name: server
image: ghcr.io/chroma-core/chroma:0.3.25
ports:
- containerPort: 8000
name: main
protocol: TCP
resources:
requests:
memory: 256Mi
cpu: 256m
limits:
memory: 2Gi
cpu: 2
readinessProbe:
httpGet:
path: /api/v1/heartbeat
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
env:
- name: CHROMA_DB_IMPL
value: "clickhouse"
- name: CLICKHOUSE_HOST
value: "localhost"
- name: CLICKHOUSE_PORT
value: "8123"
volumeMounts:
- name: chroma-server-index
mountPath: /chroma/.chroma/index
serviceAccountName: chroma
volumes:
- name: chroma-clickhouse-data
persistentVolumeClaim:
claimName: chroma-clickhouse-data
- name: chroma-clickhouse-logs
persistentVolumeClaim:
claimName: chroma-clickhouse-logs
- name: chroma-clickhouse-backups-storage
persistentVolumeClaim:
claimName: chroma-clickhouse-backups
- name: chroma-clickhouse-backups-configmap
configMap:
name: chroma-clickhouse-backups
- name: chroma-clickhouse-users
configMap:
name: chroma-clickhouse-users
- name: chroma-server-index
persistentVolumeClaim:
claimName: chroma-server-index
The weird thing is that the container is trying and failing to rebuild hnswlib
:
Rebuilding hnsw to ensure architecture compatibility
Collecting hnswlib
Downloading hnswlib-0.7.0.tar.gz (33 kB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Collecting numpy
Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
βββββββββββββββββββββββββββββββββββββββ 17.3/17.3 MB 374.4 MB/s eta 0:00:00
Building wheels for collected packages: hnswlib
Building wheel for hnswlib (pyproject.toml): started
Building wheel for hnswlib (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
Γ Building wheel for hnswlib (pyproject.toml) did not run successfully.
β exit code: 1
β°β> [55 lines of output]
running bdist_wheel
running build
running build_ext
creating tmp
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c /tmp/tmp2mywuvir.cpp -o tmp/tmp2mywuvir.o -std=c++14
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c /tmp/tmpu3p5lj57.cpp -o tmp/tmpu3p5lj57.o -std=c++11
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
return _build_backend().build_wheel(wheel_directory, config_settings,
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 416, in build_wheel
return self._build_with_temp_dir(['bdist_wheel'], '.whl',
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
self.run_setup()
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
exec(code, locals())
File "<string>", line 116, in <module>
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 343, in run
self.run_command("build")
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/tmp/pip-build-env-yh6ztnbt/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "<string>", line 103, in build_extensions
File "<string>", line 70, in cpp_flag
RuntimeError: Unsupported compiler -- at least C++11 support is needed!
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for hnswlib
Failed to build hnswlib
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects
[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: pip install --upgrade pip
2023-05-26 13:00:07 INFO chromadb.telemetry.posthog Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2023-05-26 13:00:07 INFO uvicorn.error Started server process [32]
2023-05-26 13:00:07 INFO uvicorn.error Waiting for application startup.
2023-05-26 13:00:07 INFO uvicorn.error Application startup complete.
2023-05-26 13:00:07 INFO uvicorn.error Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2023-05-26 13:00:11 INFO uvicorn.access 127.0.0.6:55909 - "GET /api/v1/heartbeat HTTP/1.1" 200
2023-05-26 13:00:16 INFO uvicorn.access 127.0.0.6:34235 - "GET /api/v1/heartbeat HTTP/1.1" 200
In my specific case, I had to change spec.strategy
from RollingUpdate
to Recreate
due to the clickhouse
container locking the files on the persistent AWS EFS storage:
2023.05.26 12:58:32.223680 [ 1 ] {} <Warning> Context: Effective user of the process (clickhouse) does not match the owner of the data (50001).
2023.05.26 12:58:32.230741 [ 1 ] {} <Error> Application: DB::Exception: Cannot lock file /var/lib/clickhouse/status. Another server instance in same directory is already running.
apiVersion: apps/v1
kind: Deployment
metadata:
name: chroma
labels:
app: chroma
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: chroma
template:
metadata:
labels:
app: chroma
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:22.9-alpine
ports:
- containerPort: 8123
name: clickhouse-http
protocol: TCP
- containerPort: 9000
name: clickhouse-tcp
protocol: TCP
resources:
requests:
memory: 256Mi
cpu: 256m
limits:
memory: 2Gi
cpu: 2
readinessProbe:
httpGet:
path: /
port: 8123
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
env:
- name: ALLOW_EMPTY_PASSWORD
value: "yes"
- name: CLICKHOUSE_TCP_PORT
value: "9000"
- name: CLICKHOUSE_HTTP_PORT
value: "8123"
- name: CLICKHOUSE_DO_NOT_CHOWN
value: "1"
volumeMounts:
- name: chroma-clickhouse-data
mountPath: /var/lib/clickhouse
- name: chroma-clickhouse-logs
mountPath: /var/log/clickhouse-server
- name: chroma-clickhouse-backups-storage
mountPath: /backups
- name: chroma-clickhouse-backups-configmap
mountPath: /etc/clickhouse-server/config.d/backup_disk.xml
subPath: backup_disk.xml
- name: chroma-clickhouse-users
mountPath: /etc/clickhouse-server/users.d/chroma.xml
subPath: chroma_users.xml
- name: server
image: ghcr.io/chroma-core/chroma:0.3.25
ports:
- containerPort: 8000
name: main
protocol: TCP
resources:
requests:
memory: 256Mi
cpu: 256m
limits:
memory: 2Gi
cpu: 2
readinessProbe:
httpGet:
path: /api/v1/heartbeat
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
env:
- name: CHROMA_DB_IMPL
value: "clickhouse"
- name: CLICKHOUSE_HOST
value: "localhost"
- name: CLICKHOUSE_PORT
value: "8123"
volumeMounts:
- name: chroma-server-index
mountPath: /chroma/.chroma/index
serviceAccountName: chroma
volumes:
- name: chroma-clickhouse-data
persistentVolumeClaim:
claimName: chroma-clickhouse-data
- name: chroma-clickhouse-logs
persistentVolumeClaim:
claimName: chroma-clickhouse-logs
- name: chroma-clickhouse-backups-storage
persistentVolumeClaim:
claimName: chroma-clickhouse-backups
- name: chroma-clickhouse-backups-configmap
configMap:
name: chroma-clickhouse-backups
- name: chroma-clickhouse-users
configMap:
name: chroma-clickhouse-users
- name: chroma-server-index
persistentVolumeClaim:
claimName: chroma-server-index
I'm looking forward for the project being fully HA.
We should have a bunch of updates here in the next month, stay tuned!
Thank you, I'm looking forward to them.
Currently my main issues I'm trying to resolve are:
/api/v1/heartbeat
as a readiness healthcheck but I've noticed that during processing of the requests Chroma DB
is not reliably responding to the heartbeat requests which is causing the EKS cluster to treat the application as unavailable./var/lib/clickhouse
directory which has the limit of 177 hard links per each file - it caused Too many links. (CANNOT_LINK)
errors.I'd be grateful for any suggestions and workarounds in the meantime.
@Oliniusz just as a heads up - we will be moving completely off of clickhouse. we will provide a seamless migration experience when that happens.
any news on this ?
We should have a bunch of updates here in the next month, stay tuned!
any updates?
any update for Kubernetes deployment
Hi everyone, we are releasing a single-node refactor tomorrow, then the next step will the distributed version. Getting closer!
Looking forward to the helm installationο½
@jeffchuber , any update on k8s deployment?
@jeffchuber , any update on k8s deployment?
@jeffchuber , Please provide some updates.
Describe the problem
chroma only provide docker for self-hosted deploymentοΌbut we use kubernetes in production environment. Is there any plan to support deployment on kurbernets, may be use k8s raw manifest or helm chart.
Describe the proposed solution
provide deployment for k8s, may be i can offer some help.
Alternatives considered
No response
Importance
would make my life easier
Additional Information
No response