CentreForDigitalHumanities / idioms

Database of Dutch Dialect Idioms
https://dutchdialectidioms.uu.nl/
Other
0 stars 0 forks source link

Allow matching substrings in free text search? #28

Open ar-jan opened 1 year ago

ar-jan commented 1 year ago

The FTS5 trigram tokenizer allows matching substrings in general (rather than only complete tokens or prefix tokens), but it does not match substrings shorter than 3 unicode characters. Supporting short tokens is required (e.g. general structure contains mainly short forms like V DO), so if substring matching is desired a different solution than SQLite FTS is needed.

Prefix tokens are supported by the (default) unicode61 tokenizer, which is used now.

ar-jan commented 1 year ago

The FTS search options are sufficient so far. If needed, partial string matching can be revisited.