Datawheel / template-chatbot

Template repository for a chatbot instance
MIT License
0 stars 1 forks source link

5th of July Updates #6

Open alexandersimoes opened 2 months ago

alebjanes commented 2 months ago

Updates on my side:

  1. Evaluation set: the evaluation set we'll use with all approaches with 100 questions is ready, along with the correct answers and correct values that should go in the answers.
  2. New content to the corpus: In order for the RAG to work, I added the content that is needed to answer these questions to the corpus (for all years available). This includes content with broad product categories (like dairy, salmon, chicken, etc.) that are composed by multiple hs codes.
  3. RAG evaluation: Now I'm running the RAG evaluation which will take each of this 100 questions, fetch the top k results using similarity search and then pass this as context to an LLM. I'll try a few combinations changing the top k results (5 or 10), the embedding model, and the final LLM model (to evaluate the costs of using gpt-4 or gpt-3.5 here). As an initial result here, the first evaluation got 81/100 questions correct.
pippo-sci commented 2 months ago
Fine-tuning results with both random sample and Ale's test set (in RAG only questions): Model Accuracy
TinyLlama 1epoch 0%
TinyLlama 10 epoch 0%
TinyLlama 50 epoch 0%
Llama2 1epoch 0%

The main issue is the model learns the text around the numbers but it gets the numbers wrong. Actually, It changes the number every time is queried. Side effect, the tinyllama models lost their capabilities to answer other inputs.

Next steps: