chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
13.36k stars 1.14k forks source link

[ENH] Cross-Version Collection Migration #2400

Open atroyn opened 1 week ago

atroyn commented 1 week ago

Description of changes

This PR creates a path to migrating from previous versions of Chroma to the new version where we have collection configuration storage. The migration is idempotent and non-destructive.

Since all collections now must have a configuration, old collections would error when loading them - this was reflected in cross-version persistence failures.

With this approach, that doesn't happen. This is a first step to providing user-facing migration tooling. For now it's just this one script, but later as we add more of these, they can be composed in a more intelligent way.

This PR includes a new CLI application as part of the chroma CLI, chroma migrate which will migrate all collections in a specified path (and optional tenant, and database), with ./chroma being the default.

Test plan

Manual Test:

Automated: test_cross_version_persist passes locally and in CI.

ALL TESTS Should pass by this point in the stack.

Documentation Changes

The migration and migration tool is documented at https://docs.trychroma.com/deployment/migration

Additionally, when a collection tries and fails to load a CollectionConfiguration from JSON, the error points the user to the same migration documentation.

TODO:

github-actions[bot] commented 1 week ago

Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas

github-actions[bot] commented 1 week ago

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

atroyn commented 1 week ago

[!WARNING] This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite. Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @atroyn and the rest of your teammates on Graphite Graphite