chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
13.49k stars 1.15k forks source link

[Feature Request]: Backup/Restore from Backup #1061

Open Solomin0 opened 10 months ago

Solomin0 commented 10 months ago

Describe the problem

Currently there is no easy way to backup and restore files from a backup with ChromaDB. If I would like to deploy on another computer locally I would need to re-embed all my documents. Or in the event of HDD failure I am at potential risk of losing my data.

Describe the proposed solution

I would like some sort of functionality either with CMD or in chromaDB Client that allows users to generate some sort of archive file(.tar .tarball) something like that and allow us to quickly restore from a separate backup. Something similar to MongoDB's mongodump/mongo restore functionality would be useful. https://www.mongodb.com/docs/database-tools/mongodump/

Alternatives considered

The only other alternative I am aware of would be to manually copy the dockerfile/persistent directory and paste it onto some removable media/other pc on network.

Importance

would make my life easier

Additional Information

Discussed this with Taz in the discord, they said to add this as a Feature request. Thanks yall.

https://discord.com/channels/1073293645303795742/1146191469124784208

tonisives commented 10 months ago

Before it is implemented, I can describe my backup/restore process in EC2:

I currently deploy on AWS ec2, where you can create snapshots from your instance volumes. This way I can start a new container with a different configuration(memory, cpu), and use the snapshot from previous instance and it just starts up and all of the data is there.

If I want to upgrade the Chroma version, I upgrade the docker chroma image in the same instance, eg keep the same volume. Therefore all of the data is retained and I can update Chroma to new version.

To copy the database itself, I think it is located in the /chroma folder. It currently has a sqlite file and other folders.

jeffchuber commented 10 months ago

@Solomin0 great idea

mfcommvault commented 10 months ago

@tonisives But was the DB being written to, at the time of snap? I ask because unless there is a freeze/thaw and flush of pending writes the DB could be in an inconsistent state. Either via script or through a tool like mongodump, it ensures the DB is consistent before the operation can complete.

jeffchuber commented 9 months ago

has a lot of subtle concerns about multitenancy and support across single-node and distributed

tazarov commented 9 months ago

@Solomin0, I am working on a CIP for this. It should be up as a PR in the next day or so. I'd be happy if you could take a look and provide your thoughts.

tristandeborde commented 9 months ago

Hello @tazarov , any news on this ? Eager to take a look :) Thanks in advance

mfcommvault commented 8 months ago

@tazarov @jeffchuber let me know if i can provide any assistance