CdC-SI / eak-copilot

The official repository of the EAK-Copilot project as part of the Innovation Fellowship 2024.
https://cdc-si.github.io/eak-copilot/
GNU General Public License v3.0
4 stars 0 forks source link

add category of question to chat display #113

Open K-Schubert opened 2 months ago

K-Schubert commented 2 months ago

From FAQ scraping.

tabee commented 2 months ago

I don't know if it is necessary to display the topic categories in the chat reply. Perhaps in future there will be an option to select a language in the settings. Then we would filter the answers for question completion, especially in the expert database. The same could then be done for certain subject categories, e.g. if you don't work in the area of child benefit, you don't need to see any suggested questions from the area of child benefit.

Inspiration:


# category extraction 
import urllib.parse

def _extract_category(url, category_position_in_path=2):
    """Extract the category from the URL of a webpage."""
    parsed_url = urllib.parse.urlparse(url)
    path = parsed_url.path
    path = path.split('/')
    return path[category_position_in_path] if path and len(path) > 2 else None

print(_extract_category('https://faq.bsv.admin.ch/de/familienzulagen/wann-gilt-ein-jugendlicher-als-ausbildung'))
K-Schubert commented 2 months ago

Yes I think that's a good idea. The more data/db filtering we can do before doing RAG/autocomplete the better and the higher the quality of the answer. FYI: Research in RAG has shown increased performance with "semantic routing" which is the same concept but with embeddings of queries instead of hard category filtering.