CdC-SI / eak-copilot

The official repository of the EAK-Copilot project as part of the Innovation Fellowship 2024.
https://cdc-si.github.io/eak-copilot/
GNU General Public License v3.0
4 stars 0 forks source link

configure pgvector index with env vars #123

Open K-Schubert opened 2 months ago

K-Schubert commented 2 months ago

Description

Configure vectordb index creation (hnsw, ivfflat).

Configure semantic search with pgvector (search params such as "m", "ef_construction" for hnsw, "probes" for ivfflat) through .env file.

K-Schubert commented 1 month ago

@tabee

Initializing and inserting data in pgvector db doesn't require constructing an "index" (search through the vectors is performed with exact nearest neighbours which is exact search with perfect recall, but it's slow if there are many vectors in the db). An index can be built to speed up search (decreasing recall) using 2 different techniques with their advantages/tradeoffs:

Both methods require some finetuning with index parameters to evaluate search performance.

Since there probably won't be millions of vectors in the db, we might want to skip this index building?

https://github.com/pgvector/pgvector