Rag integration - Githubissues

AntoniaJost commented 2 months ago

I included a simple rag version that contains all text files from https://plaintextipcc.com/ that were downloaded in July/August 2024 and uploaded to Swiftbrowser. Tests are included. Environment is updated. So is the system role to process the additional information. An increase in accuracy was detected while running the evaluation.

Potential further optimization (I will open an issue for this): Right now, when one (or multiple) file(s) in data/ipcc_text_reports have changed, the entire chunking and embedding will take place again and get added on top of the already existing chunks, even if they already exist. So it would be ideal if only the file that has changed or was added gets chunked and added into the chromadb. However, it should not be too much of an issue for now if we just keep using it for the moment with the ipcc text files and don't change them.

AntoniaJost commented 2 months ago

closes #104 and #24

AntoniaJost commented 1 month ago

I still need to update the test_rag.py file, which is why the tests are currently still failing.

AntoniaJost commented 1 month ago

The tests have been updated. Code is now ready for another review.

CliDyn / climsight

Rag integration #115