facebookresearch / flores

Facebook Low Resource (FLoRes) MT Benchmark
Other
705 stars 123 forks source link

Is monolingual data used in the paper available for downloading? #5

Closed xiamengzhou closed 5 years ago

xiamengzhou commented 5 years ago

Hi, I don't find any access to get the monolingual data used in the paper. Is there anyway I can access those?

vishrav commented 5 years ago

Hi. You can download the data from the shared task webpage http://www.statmt.org/wmt19/parallel-corpus-filtering.html

xiamengzhou commented 5 years ago

Thanks! But it's like the common crawl monolingual data for sin and nep is not provided in the shared task webpage?

vishrav commented 5 years ago

The commoncrawl data links have now been updated in the shared task webpage http://www.statmt.org/wmt19/parallel-corpus-filtering.html

zdou0830 commented 5 years ago

Hi, I'm having trouble decompressing ”commoncrawl.deduped.en.xz“.

unxz: commoncrawl.deduped.en.xz: Unexpected end of input

I can decompress other files. Is there anything wrong with the file?