deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.59k stars 1.82k forks source link

German Model ? #872

Closed MathiasKrill closed 3 years ago

MathiasKrill commented 3 years ago

Question Is there a german model for question answering you can recomment?

Additional context I already asked per formular on the webside- but it showas an error message/invalid messagen when sending. Can you recomment a model for german language?

I tried: deepset/xlm-roberta-large-squad2 and Sahajtomar/GBERTQnA
Both work sometimes and somtimes i get in the very beginning the error:

{"detail":"There was an error parsing the body"} Question was: curl --request POST --url 'http://127.0.0.1:8000/models/1/doc-qa' --data '{"questions": ["Nenne mir Beispiele für einen"]}'

Best wishes Mathias

Timoeller commented 3 years ago

Hey thanks for reaching out about German QA. You already found the two bests methods that can be used for German QA currently.

We are working on a German QA dataset that improves performance of these methods significatnly and will open source it latest by April 15th. For what use case do you need the model and where were the xlm-r and GBERTQnA failing?


about the api call: I am wondering where the "text" field is that you want to extract an answer from? Normally the input should look something like:

QA_input = [
            {
                "questions": ["Who counted the game among the best ever made?"],
                "text":  "Twilight Princess was released to universal critical acclaim and commercial success. It received perfect scores from major publications such as 1UP.com, Computer and Video Games, Electronic Gaming Monthly, Game Informer, GamesRadar, and GameSpy. On the review aggregators GameRankings and Metacritic, Twilight Princess has average scores of 95% and 95 for the Wii version and scores of 95% and 96 for the GameCube version. GameTrailers in their review called it one of the greatest games ever created."
            }]

Strange that it sometimes works. Do you supply text there?


About the website. We could not reproduce your issue. Could you give some more info (e.g. Browser) and also the error message that you get?

MathiasKrill commented 3 years ago

Hallo for the quick response.

First the problem i mentioned on the webside: "undefined"

errormessage

For what use case do you need the model and where were the xlm-r and GBERTQnA failing?

I want to extract a passage from a bigger text like a pdf with 20 pages or a much bigger document. And I use the model to find the most probable passage within (or in other document) which is matching to the asked question.

It seems it is failing somewhere inthe beginning, i do: curl -X POST "http://127.0.0.1:8000/models/1/doc-qa" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"questions\":[\"Nenne mir Beispiele für einen\"],\"top_k_reader\":5,\"top_k_retriever\":5}"

And than see in terminal:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    96  100    48  100    48  48000  48000 --:--:-- --:--:-- --:--:-- 96000{"detail":"There was an error parsing the body"}

When i do the question: curl -X POST "http://127.0.0.1:8000/models/1/doc-qa" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"questions\":[\"Was ist ein Unfall?\"],\"top_k_reader\":5,\"top_k_retriever\":5}" Its fine.

I took these requests from the swagger page and cut out:

  "filters": {
    "additionalProp1": "string",
    "additionalProp2": "string",
    "additionalProp3": "string"
  }

Currently iam adding more logs to the coe to find this "There was an error parsing the body" error - havent found the errormessage within the code of the project.

Timoeller commented 3 years ago

Hey about the "text" field. Sorry I mistook your request for another API. Of course you index documents into haystacks document store and against these documents you can just ask questions. So there should not be a "text" field in the body.


About the website, do you use another Browser than Chrome?


Currently iam adding more logs to the coe to find this "There was an error parsing the body" error - havent found the errormessage within the code of the project.

This is a good idea. Maybe you have empty questions in your data?

MathiasKrill commented 3 years ago

"We are working on a German QA dataset that improves performance of these methods significatnly and will open source it latest by April 15th."

This is a good idea. Maybe you have empty questions in your data?

Actually no. When iam building the request by "postman" insteadt with curl its all fine. For unknown reason in english the problem never happend once with curl. Nevermind ill use postman now.

About the website, do you use another Browser than Chrome?

Only Chrome

Thank you again ✌️

Timoeller commented 3 years ago

Hey, issue seems fixed, using postman seemed to resolve the problems.

I will update you about the German Model once we have our paper out.

Closing now, feel free to reopen any time.