Closed dujijun007 closed 10 months ago
@dujijun007 How about naming this index as GIN like postgresql and cockroach?
@dujijun007 How about naming this index as GIN like postgresql and cockroach?
@imay Indeed, GIN is more familiar to developers. I will rename it.
Feature request
Motivation Inverted index is a commonly used data structure in information retrieval systems, which maps the "word-document" relationship to the "document-word" relationship. In an inverted index, each unique word has an associated index pointing to a list of documents containing that word. This indexing method makes document search based on words highly efficient.
In OLAP engines, inverted indexes and information retrieval play a crucial role. OLAP engines are typically used for handling complex analytical queries, which often involve large amounts of data and require aggregation and filtering across multiple dimensions.
Inverted indexes can greatly improve the performance of these queries. For example, if you want to find all data rows containing a specific word, you can directly look up the inverted index instead of scanning the entire dataset. This can significantly optimize the query scan volume, especially when retrieving a small amount of data from massive datasets, resulting in greatly improved query performance.
Currently, several OLAP engines, such as
Doris
andClickHouse
, have their own implementation of inverted indexes.Solution
We want to abstract an implementation framework for inverted indexes, and CLucene is the first specific implementation we are trying.
When we use inverted index, we can use grammar below to manage, write and query index.