Closed Bachstelze closed 1 year ago
There are many languages described in the paper. Is this the dataset for all of them?
This repo contains the relabeled targets for English, German and Russian. For pre-training, we used a Common Crawl dataset with 101 languages.
There are many languages described in the paper. Is this the dataset for all of them?