TheDataStation / ver

Data Discovery Tools and Systems
MIT License
6 stars 10 forks source link

Investigate keyword search inaccurate results #59

Open kevindharmawan opened 1 year ago

kevindharmawan commented 1 year ago

Run the quick_start_cli.md then ver_quick_start.py. The current version of keyword search or full text search (FTS) still missing some columns. The current investigation results are:

  1. Results from the fts_query function in dindex_store/fulltext_index_duckdb.py with conjunctive := 1 is incomplete. For example, it wasn't able to find any column that contains "University of Chicago - Woodlawn".
  2. I can confirm that the index builder runs correctly, at least for the case in number 1. I checked semi-manually by querying the raw data (in DuckDB) used by the FTS.
  3. When conjunctive := 1 is removed, the results are not sorted from the best to worst.