Performance Test Strategy - Githubissues

digitalfabrik / integreat-chat

Interface to self-hosted large language models and vector databases to provide improved Integreat Chat functionality

https://integreat-app.de

MIT License

1 stars 0 forks source link

Performance Test Strategy #41

Open svenseeberg opened 2 months ago

svenseeberg commented 2 months ago

We want to do performance testing on our different modules:

[x] Embedding model (already done?)
[x] Chunking methods: https://github.com/digitalfabrik/integreat-chat/issues/38
[x] Prompt
[x] LLM https://github.com/digitalfabrik/integreat-chat/issues/10
[ ] #50

3 of the above mentioned components should be fixed, while we change one of them and test different approaches with our benchmark questions.

Benchmark questions in order of their priority and based on our user stories:

simple question: How can I learn German?
simple question with complicated words: I need to know the German language for a job. What do I need to do?
question with no answer in content: When was JFK assassinated?
complicated question (double question, more context, etc): How can a 17 years old person from Ukraine learn German?
malformed question (spelling / grammar mistakes): I are Ukraina. Need job.

Extended Benchmark questions based on Persona "Iryna"

I need a German course with parallel child care.
Can I get a mentor that helps me find a job?
When are the next German courses for A2 level?
Where can I get my university degree translated?
My son is 6 years old and has to go to school soon. What do I need to keep in mind?

Extended Benchmark questioins not based on Personas:

I'm new to Germany and I was born in Egypt. I've studied computer science and want to work as a software engineer in Germany. What do I have to do?
As a software engineer I normally can work in English. Do I really need a C1 level German certificate?

steffenkleinle commented 1 month ago

Possible training data with questions about integreat content: https://huggingface.co/datasets/digitalfabrik/integreat-qa The questions are relatively simple and well phrased, so only cover a subsection of cases mentioned above.

svenseeberg commented 1 month ago

Tests based on 9f57f80f68222daf3ac1ce088f727d8b00d92797 (llama3.1:8b, skip questions with no matching documents, chunking at h2 tags)

You can learn German through SPEAK's online language learning groups, where you can interact with others and meet daily for 90 minutes over 2 weeks. Alternatively, you can use various online resources such as Mein Deutschbuch, Deutsch-Uni Online (DUO), and the Basic language course from Deutsche Welle to study at your own pace.
To learn German for a job in Germany, you should take a vocational German course such as DeuFöV. This will help improve your speaking skills and prepare you for the workplace. You can find more information on how to apply for these courses through the Jobcenter or Employment Agency.
no answer
A 17-year-old person from Ukraine can learn German through SPEAK's online language courses, which include interactive groups and video conferencing. Additionally, they can use various free apps and websites such as Ankommen, Serlo ABC, Mein Deutschbuch, and Deutsch-Uni Online to learn German at their own pace. Some of these resources also offer placement tests and certification.
no answer

svenseeberg commented 1 month ago

I need to know the German language for a job. What do I need to do?

Does not always yield a result. It seems that in 1 of 4 cases the message is not classified as a question that requires an answer.

svenseeberg commented 1 month ago

Another interesting prompt:

Is there a cinema in Munich that shows English movies?

{
  "answer": "I don't know. The provided context does not mention cinemas or movie showings in Munich.",
  "sources": [
    "/muenchen/en/culture-leisure-sport/general-information/",
    "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
    "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/"
  ],
  "details": [
    {
      "source": "/muenchen/en/culture-leisure-sport/be-creative/youth-theatre-workshop-in-the-bellevue-di-monaco/",
      "score": 0.7928134202957153
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/general-information/",
      "score": 0.855070948600769
    },
    {
      "source": "/muenchen/en/culture-leisure-sport/meet-people/meetings-in-the-neighbourhood/",
      "score": 1.0023198127746582
    }
  ],
  "status": "success"
}

svenseeberg commented 1 month ago

Another test question with frequent bad results:

Hi I'm from Afghanistan and 17 years old. How can I learn German?

svenseeberg commented 1 month ago

We tried to get more consistent documents from Milvus (see #60) with flat indexes but still got varying results. The only possible conclusion: the embedding model is producing different vectors for the same query.

*edit: see https://github.com/digitalfabrik/integreat-chat/issues/61#issuecomment-2431861775

Another observation: the chunking (and chunk encoding) might be problematic as well.

svenseeberg commented 1 week ago