CdC-SI / ZAS-EAK-CopilotGPT

The official repository of the EAK-Copilot project as part of the Innovation Fellowship 2024.
https://cdc-si.github.io/ZAS-EAK-CopilotGPT/
GNU General Public License v3.0
6 stars 0 forks source link

EPIC: feedback on current code #185

Closed Shi-Ho closed 2 months ago

Shi-Ho commented 5 months ago

General

Autocomplete

Webscraper

RAG

Database

Other code non-related things

K-Schubert commented 5 months ago

@Shi-Ho Can you investigate alternative DB solutions for autocomplete and rag. We need something that could ideally perform SQL querying, text (fuzzy) search and vector search.

eg. elasticsearch

Shi-Ho commented 5 months ago

Hi @K-Schubert ! I just found out PostgreSQL has its own fuzzysearch module with Levenshtein distance, which might be quite efficient and more pratical than Elasticsearch that requires setting up another server. More info here: https://www.postgresql.org/docs/current/fuzzystrmatch.html#FUZZYSTRMATCH-LEVENSHTEIN

Also, I've updated the feedback list with some other points

Shi-Ho commented 5 months ago

Some additional information:

Shi-Ho commented 5 months ago

Regarding the current implementation of the Copilot, we have:

However, people will not necessarily click on the autocomplete drop-down list, even if there is very similar questions (similarly, people do not necessarily click on the drop down list results on a Google search even if there's the exact same question), and it might be difficult to ask people to change their habits regarding search engines.

A solution could be, on enter/send button, to return the autocomplete question's answer with the smallest distance from the input, IF that distance if smaller than a very small threshold.

Exemple:

Currently, because they clicked on "Enter", Copilot will query the RAG. However, the answer could clearly be found in the database. If the Copilot could return the autocomplete instead of query the RAG, this would be cheaper while providing the correct answer.