Closed thfrkielikone closed 1 month ago
Seems that there are also some other issues regarding the integration with the latest OpusTools using moses preprocssing, like setting output_directory
makes the process totally fail. I'll look into this, but I think the problems are on OpusTool's side (ping @miau1).
I suggest using the raw
or xml
options for preprocessing until we get this fixed.
Fixed in 3.2.0. It is now recommended to download corpora using the moses
preprocessing.
Running this:
Results in files opensubtitles.fi.gz and opensubtitles.en.gz that are in fact plain text.