chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
13.4k stars 1.14k forks source link

[ENH] Collection Configuration Storage #2338

Open atroyn opened 2 weeks ago

atroyn commented 2 weeks ago

Description of changes

This PR introduces storing collection parameters as a Configuration, a JSON-serializable defined object with specified configuration fields, adapting @HammadB's work in https://github.com/chroma-core/chroma/pull/1491.

Configurations are created at creation time for collections, and are immutable thereafter.

The JSON-serialized config is stored as a new text column config_json_str on the collections table.

Note that in this stack, Chroma does not actually consume the configuration for anything - this will be done in a separate stack.

This PR also updates our versions of black and mypy, which were failing with our new definitions but work with the new versions.

Test plan

Tests pass in CI except the following:

Documentation Changes

N/A - This is not a user-facing change. Documentation around migrations are later in the stack along with the cross-version persistence tests.

github-actions[bot] commented 2 weeks ago

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

github-actions[bot] commented 2 weeks ago

Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas

atroyn commented 2 weeks ago

[!WARNING] This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite. Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @atroyn and the rest of your teammates on Graphite Graphite