Closed thammegowda closed 1 year ago
Another issue: https://github.com/Helsinki-NLP/OPUS-MT-train/blob/ce038869e152e8a08bcb11454399b0dc8f01adb2/scripts/preprocess-txt.sh#L43
$ echo -e "line1\nline2\nline3"
line1
line2
line3
$ echo -e "line1\nline2\nline3" | perl -C -pe 's/\p{C}/ /g;'
line1 line2 line3
I think we need to preserve line breaks in pre and post processing.
Good points. Fixed the hostname issue above (but the script will still appear in many of the released model packages, unfortunately). The second issue was already fixed in preprocess-spm.sh
but now I also pushed a fix to the preprocess-txt.sh script.
https://github.com/Helsinki-NLP/OPUS-MT-train/blob/3cd0bd3f75e4d2b4b0483f7b53be4722066b7d33/scripts/preprocess-bpe.sh#L9-L17
Some nodes don't have domain name set, and as a result we get empty string from
hostname -d
. This leads to[: ==: unary operator expected"
[1].Recommend fix: use quotes i.e.,
"$(hostname -d)"
to avoid this error.If anyone reached here while trying to run pretrained models, here is a quickfix via sed
And the diff
[1] https://stackoverflow.com/questions/13617843/unary-operator-expected-error-in-bash-if-condition