5th of July Updates - Githubissues

Datawheel / template-chatbot

Template repository for a chatbot instance

MIT License

0 stars 1 forks source link

Updates on my side:

Evaluation set: the evaluation set we'll use with all approaches with 100 questions is ready, along with the correct answers and correct values that should go in the answers.
New content to the corpus: In order for the RAG to work, I added the content that is needed to answer these questions to the corpus (for all years available). This includes content with broad product categories (like dairy, salmon, chicken, etc.) that are composed by multiple hs codes.
RAG evaluation: Now I'm running the RAG evaluation which will take each of this 100 questions, fetch the top k results using similarity search and then pass this as context to an LLM. I'll try a few combinations changing the top k results (5 or 10), the embedding model, and the final LLM model (to evaluate the costs of using gpt-4 or gpt-3.5 here). As an initial result here, the first evaluation got 81/100 questions correct.

Fine-tuning results with both random sample and Ale's test set (in RAG only questions):	Model	Accuracy
TinyLlama 1epoch	0%
TinyLlama 10 epoch	0%
TinyLlama 50 epoch	0%
Llama2 1epoch	0%

Fine-tuning results with both random sample and Ale's test set (in RAG only questions):

Model

Accuracy

TinyLlama 1epoch

TinyLlama 10 epoch

TinyLlama 50 epoch

Llama2 1epoch

Datawheel / template-chatbot