Closed HammadB closed 2 days ago
Please leverage this checklist to ensure your code review is thorough before approving
This stack of pull requests is managed by Graphite. Learn more about stacking.
Join @HammadB and the rest of your teammates on Graphite
Description of changes
Summarize the changes made by this PR. The primary intent of this PR is to remove the is_metadata_valid invariant which was a workaround for our metadata strategy generating faulty metadata and then us special casing all uses of the record set strategy to handle invalid generations. This PR patches the metadata generation to not generate invalid metadata.
Adds modes in test_add to add a medium sized record set. This was initially timing out in hypothesis's generation. Hypothesis bounds the buffer size of the bytes it uses to do random generation, so generating larger metadata was resulting in examples being marked at OVERRUN by conjecture (gleaned from issues like https://github.com/HypothesisWorks/hypothesis/issues/3999 + reading hypothesis code + stepping through it). This PR adds the ability to generate N fixed metadata entries and uniformly distribute them over the record set, reducing the overall entropy.
Fixes a bug that test_embeddings was not handling None as a possible metadata state, since this state was never generated. Added an explicit test for this.
Fixes a bug in the reference filtering implementation in test_filtering that did not handle None metadata since that state was never generated.
This PR is forced to touch types related to metadata, which are incorrect and cause typing errors. I ignored the errors to minimize the surface area of this change and defer those changes to the pass mentioned in https://github.com/chroma-core/chroma/issues/2292.
Test plan
How are these changes tested? These changes are covered by existing tests, and
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
No external changes required.