adbar / German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
450 stars 66 forks source link

Dataset like CoLA #13

Closed dragonnikkirocks closed 3 years ago

dragonnikkirocks commented 3 years ago

Is there a dataset like https://arxiv.org/pdf/1901.03438.pdf for German? I want to use it for a grammar checker using BERT, but didn't find any. Do you have any suggestions ? Thanks in advance

adbar commented 3 years ago

Hi, not to my knowledge but I'm not sure.

dragonnikkirocks commented 3 years ago

Thanks for the reply. I am trying to make a spell checker for german using transformer as a downstream task. Do you have any suggestions on how I can approach this? Thanks in advance

zesch commented 3 years ago

What others have done in this situation was to train on artificial errors. Just take correct text and introduce some error. Of course that won't reflect real errors in every respect, but this is usually more than offset by being able to train on much more data.