databendlabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.85k stars 750 forks source link

fix(storage): fix inverted index `term_id` may conflict between multiple fields #16687

Open b41sh opened 3 weeks ago

b41sh commented 3 weeks ago

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

We use the value corresponding to the Term obtained from the FST as the term_id to uniquely identify a Term. However, since our search supports multiple fields, this term_id may conflict between multiple fields, resulting in wrong results. This PR adds a new TermId structure, including the field_id and the term_ordinal field, to uniquely identify a Term and avoid conflicts.

Tests

Type of change


This change isโ€‚Reviewable