Closed K-Schubert closed 3 days ago
Added scraping/indexing of PDFs from https://ahv-iv.ch memento section.
Added scraping/indexing of HTML webpages from https://www.eak.admin.ch and https://www.zas.admin.ch sitemaps.
NOTE: this PR is based on the refactored branch feature/236-integrate-os-llm. The utils dir has been renamed to components, and interfaces for embedding/llm models as well as factories have been added.
@Shi-Ho Please check latest commits.
Added scraping/indexing of PDFs from https://ahv-iv.ch memento section.
Added scraping/indexing of HTML webpages from https://www.eak.admin.ch and https://www.zas.admin.ch sitemaps.
NOTE: this PR is based on the refactored branch feature/236-integrate-os-llm. The utils dir has been renamed to components, and interfaces for embedding/llm models as well as factories have been added.