Datawheel / template-chatbot

Template repository for a chatbot instance
MIT License
0 stars 1 forks source link

12th of July Updates #7

Open alebjanes opened 4 months ago

alebjanes commented 4 months ago

RAG Evaluation

  1. 100 questions Types of questions:

    • 60 on general trade
    • 12 on growth/variation
    • 28 on rankings
  2. RAG evaluation results

Best combination tested so far: multi-qa-mpnet-base-cos-v1 (embeddings) + gpt-3.5-turbo (LLM)

Out of the wrong answers:

We're preparing a presentation gathering the results of all approaches with more detail. Next week I'll be improving the RAG + LLM and evaluating the previous multi-layer approach with Pippo.

pippo-sci commented 4 months ago

Fine-tune MAPE results:

Measuring the median absolute % error on value qty, taking the 1.1B params model with one epoch as baseline (ft_tiny0), the error decreased on the 50 epoch trained model (ft_tiny2) and increased when increasing model size (to 7B params, ft_llama2). Sort of make sense, larger models are harder to finetune.

Fine tuning tinyLlama to produce API calls instead, the accuracy is 12%, many because it struggles with HS code numbers. Other than that, query accuracy is 89%. On the HS numbers the % mean error is about 12%. but again every time the model is query with the same question it returns a slightly different number.