Closed Shi-Ho closed 2 months ago
@Shi-Ho Can you investigate alternative DB solutions for autocomplete and rag. We need something that could ideally perform SQL querying, text (fuzzy) search and vector search.
eg. elasticsearch
Hi @K-Schubert ! I just found out PostgreSQL has its own fuzzysearch module with Levenshtein distance, which might be quite efficient and more pratical than Elasticsearch that requires setting up another server. More info here: https://www.postgresql.org/docs/current/fuzzystrmatch.html#FUZZYSTRMATCH-LEVENSHTEIN
Also, I've updated the feedback list with some other points
Regarding the current implementation of the Copilot, we have:
However, people will not necessarily click on the autocomplete drop-down list, even if there is very similar questions (similarly, people do not necessarily click on the drop down list results on a Google search even if there's the exact same question), and it might be difficult to ask people to change their habits regarding search engines.
A solution could be, on enter/send button, to return the autocomplete question's answer with the smallest distance from the input, IF that distance if smaller than a very small threshold.
Exemple:
Currently, because they clicked on "Enter", Copilot will query the RAG. However, the answer could clearly be found in the database. If the Copilot could return the autocomplete instead of query the RAG, this would be cheaper while providing the correct answer.
General
question
in most functions, especially autocomplete related ones, is a bit confusing (it leads to description like "return a list of questions that match the question"). Rename toinput
oruser_input
maybe?src/
folder to have a distinct split between docs and codeAutocomplete
autocomplete
related functions to an autocomplete class (is it possible?)get_fuzzy_match
be faster using a panda dataframe? result could be ordered (idk if this is good practice)INSERT ... ON DUPLICATE KEY UPDATE
inupdate_or_insert_data
, add unique index to query dataweb_scraper
toindexing/app/
'*'
as example) for future integrationsWebscraper
RAG
rag.main.doc
andautocomplete.main.get_semantic_similarity_match
have similar sql queries, is there a way to simplify this? is having a single class that can interact with the db good practice? -> should be solved after refactoring and ORM implementationDatabase
Other code non-related things
Features
followed by a sectionHow it works - The EAK-Copilot currently features: ...
is a bit confusing