Lighter-swift / Lighter

Swift APIs for SQLite: Type-safe down to the schema. Very, very, fast. Dependency free.
https://lighter-swift.github.io/documentation/lighter/
MIT License
458 stars 12 forks source link

Add support for full text search (FTS 3,4,5) #25

Open helje5 opened 1 year ago

helje5 commented 1 year ago

It would be nice to detect FTS tables and do special things with them. I.e. also omit FTS backing tables from Lighter schemas (they need to be included in schemas if DB recreation is on, but no regular Swift funcs etc should be emitted for them?).

Table Detection

The tables can be detected by looking at the SQL in the metadata:

CREATE VIRTUAL TABLE papers USING fts3(...)

i.e. "VIRTUAL TABLE" and "USING fts*".

FTS tables, even contentful, have no associated column types (though they should be all TEXT?).

FTS creates a set of supporting, regular, tables that should be omitted, FTS 3:

papers_content 
papers_segments
papers_segdir  

FTS 4:

papers4_content 
papers4_segments
papers4_segdir  
papers4_docsize 
papers4_stat    

FTS 5:

email_data     
email_idx      
email_content  
email_docsize  
email_config   

So first the FTS table needs to be detected, than the associated tables need to be tagged/excluded.

External content FTS tables can refer to a backing table ("documents" in this case):

USING fts5(title, value, content='documents')

So that can be detected too (i.e. whether a regular table has one (or more! e.g. for multiple languages) FTS backing tables).

Operations

It is unclear what kind of operations would need to be supported? Everything in FTS centers around MATCH queries, like:

SELECT * FROM mail WHERE body MATCH 'sqlite';

Note that FTS table columns are completely untyped (but except for rowid always TEXT?) The rowid is also more important to match up w/ the content table and needs to be exposed? (e.g. it can be aliased to an integer primary key and is also stable between VACUUMs).

Then FTS also has a set of special functions, like highlight and snippet:

SELECT highlight(email, 2, '<b>', '</b>') FROM email WHERE email MATCH 'fts5';

2 is the 0 based column index here ("email" is the FTS table). Also note that the table can be used on the left side of the match.

If an FTS table has a backing table (content='mytable'), maybe the ops should be directly attached to the content table, like sqlite_mytable_fts(...).

FTS 5 query language:

<phrase>    := string [*]
<phrase>    := <phrase> + <phrase>
<neargroup> := NEAR ( <phrase> <phrase> ... [, N] )
<query>     := [ [-] <colspec> :] [^] <phrase>
<query>     := [ [-] <colspec> :] <neargroup>
<query>     := [ [-] <colspec> :] ( <query> )
<query>     := <query> AND <query>
<query>     := <query> OR <query>
<query>     := <query> NOT <query>
<colspec>   := colname
<colspec>   := { colname1 colname2 ... }

Ordering has a special rank column (ORDER BY rank), that would need to be made available.

helje5 commented 1 year ago

https://www.sqlite.org/fts3.html https://www.sqlite.org/fts5.html