datafuselabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.31k stars 704 forks source link

feat(query): create table support add inverted index #15547

Closed b41sh closed 2 weeks ago

b41sh commented 2 weeks ago

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

create table support add inverted index

mysql> CREATE TABLE t (
    ->   id int,
    ->   content string,
    ->   INVERTED INDEX idx1 (content) tokenizer = 'chinese' filters = 'english_stop,english_stemmer,chinese_stop'
    -> );
Query OK, 0 rows affected (0.11 sec)

mysql> show create table t;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                            |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE t (
  id INT NULL,
  content VARCHAR NULL,
  SYNC INVERTED INDEX idx1 (content) filters = 'english_stop,english_stemmer,chinese_stop', tokenizer = 'chinese'
) ENGINE=FUSE |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.09 sec)
Read 0 rows, 0.00 B in 0.035 sec., 0 rows/sec., 0.00 B/sec.

Tests

Type of change


This change isโ€‚Reviewable

b41sh commented 2 weeks ago

Reviewed 23 of 23 files at r1, all commit messages. Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @b41sh and @sundy-li)

_src/meta/api/src/schema_api_impl.rs line 1578 at r1 (raw file):_

                }
            }
        }

There isn't a issue but I do not get it: Why can't a column id be used more than once in different indexes?

Code quote:

        if !req.table_meta.indexes.is_empty() {
            // check the index column id exists and not be duplicated.
            let mut index_column_ids = HashSet::new();
            for (_, index) in req.table_meta.indexes.iter() {
                for column_id in &index.column_ids {
                    if req.table_meta.schema.is_column_deleted(*column_id) {
                        return Err(KVAppError::AppError(AppError::IndexColumnIdNotFound(
                            IndexColumnIdNotFound::new(*column_id, &index.name),
                        )));
                    }
                    if index_column_ids.contains(column_id) {
                        return Err(KVAppError::AppError(AppError::DuplicatedIndexColumnId(
                            DuplicatedIndexColumnId::new(*column_id, &index.name),
                        )));
                    }
                    index_column_ids.insert(column_id);
                }
            }
        }

@drmingdrmer The main reason is that creating an index takes a long time and the index file is large, creating multiple indexes for the same column will lead to more resource consumption, so we hope that users do not do this.