Helsinki-NLP / OpusTools

67 stars 17 forks source link

Format of downloaded files does not match the format expected by opus_read #24

Closed keith555 closed 2 years ago

keith555 commented 3 years ago

I downloaded files with the following command

python opus_get -s th -t ru -d Opensubtitles

After the files were downloaded I ran the following command

python opus_read -d Opensubtitles -s th -t ru -wm tmx -w th-ru.tmx

Error messages of the following form were repeatedly displayed

There is no item named 'ru/2004/304141/158903.xml.gz' in the archive '.\Opensubtitles_latest_xml_ru.zip'
Continuing from next sentence file pair.

The format of the files in the downloaded zip is

OpenSubtitles\xml\ru\1191\3276470\5646552.xml

miau1 commented 2 years ago

I ran these command and didn't have any issues. Also, you don't have to run opus_get before opus_read. Opus_read will automatically suggest to download the necessary files. I'll close this issue now, but feel free to reopen if there are problems.