I included a simple rag version that contains all text files from https://plaintextipcc.com/ that were downloaded in July/August 2024 and uploaded to Swiftbrowser.
Tests are included. Environment is updated. So is the system role to process the additional information.
An increase in accuracy was detected while running the evaluation.
Potential further optimization (I will open an issue for this): Right now, when one (or multiple) file(s) in data/ipcc_text_reports have changed, the entire chunking and embedding will take place again and get added on top of the already existing chunks, even if they already exist. So it would be ideal if only the file that has changed or was added gets chunked and added into the chromadb.
However, it should not be too much of an issue for now if we just keep using it for the moment with the ipcc text files and don't change them.
I included a simple rag version that contains all text files from https://plaintextipcc.com/ that were downloaded in July/August 2024 and uploaded to Swiftbrowser. Tests are included. Environment is updated. So is the system role to process the additional information. An increase in accuracy was detected while running the evaluation.
Potential further optimization (I will open an issue for this): Right now, when one (or multiple) file(s) in
data/ipcc_text_reports
have changed, the entire chunking and embedding will take place again and get added on top of the already existing chunks, even if they already exist. So it would be ideal if only the file that has changed or was added gets chunked and added into the chromadb. However, it should not be too much of an issue for now if we just keep using it for the moment with the ipcc text files and don't change them.