google-research-datasets / clang8

cLang-8 is a dataset for grammatical error correction.
100 stars 5 forks source link

Dataset languages #10

Closed Bachstelze closed 1 year ago

Bachstelze commented 1 year ago

There are many languages described in the paper. Is this the dataset for all of them?

ekQ commented 1 year ago

This repo contains the relabeled targets for English, German and Russian. For pre-training, we used a Common Crawl dataset with 101 languages.