CdC-SI / eak-copilot

The official repository of the EAK-Copilot project as part of the Innovation Fellowship 2024.
https://cdc-si.github.io/eak-copilot/
GNU General Public License v3.0
4 stars 0 forks source link

Feature/251 quick indexing pipeline #257

Closed K-Schubert closed 3 days ago

K-Schubert commented 1 week ago

Added scraping/indexing of PDFs from https://ahv-iv.ch memento section.

Added scraping/indexing of HTML webpages from https://www.eak.admin.ch and https://www.zas.admin.ch sitemaps.

NOTE: this PR is based on the refactored branch feature/236-integrate-os-llm. The utils dir has been renamed to components, and interfaces for embedding/llm models as well as factories have been added.

K-Schubert commented 5 days ago

@Shi-Ho Please check latest commits.