Closed alainloisel closed 1 year ago
I overlooked the issue, but it seems pretty interesting finding. We have now a bit more variations of those multilingual models, so I wonder if this still holds with these new models
For the German models, we are aware of their poor quality, and it might be because of the small training instances in German QA dataset compared to other language.
While testing French models on hugging face but also on my machine , I found that many of the french models are always trimming the answer in the middle of the text at around 70 to 80 characters . I found that out at least in these models : lmqg/mt5-small-frquad-qg-ae ; lmqg/mt5-small-frquad-qg ; [lmqg/mt5-small-frquad-qag Also for German : mt5-small-dequad-qg . I obtained also the same problem while trying Mt0 models .
I wondered if this is the reason why the reported performances for these models are low...
As an example :
generate question :Le dessus des ailes a une couleur de fond noir opaque. Les ailes antérieures et postérieures sont traversées par une large bande médiane bleu turquoise semi-hyaline qui va de la zone tornale de l'aile postérieure à la zone apicale de l'aile antérieure . Cette bande est plus large en son milieu, plus ou moins verdâtre et maculaire à l'aile antérieure, et la partie de la bande qui traverse les espaces 6, 7 et 8 de l'aile postérieure est blanchâtre. L'aile postérieure comporte par ailleurs une série de minces lunules submarginales bleues
Answer : Quelle est la couleur de la bande des ailes antérieures et postérieures ( not completed)
around 80 characters .