Helsinki-NLP / OPUS-CAT

OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPUS-CAT includes a local offline MT engine and a collection of CAT tool plugins.
MIT License
70 stars 11 forks source link

Garbage in translation (http & Trados & OmegaT) #50

Open claude-ws01 opened 2 years ago

claude-ws01 commented 2 years ago

Hello, I'm getting "@ @ " garbage in the translations.

ENVIRONMENT:

In OpusCatMTEngine's window, I do "translate with model" and I get no garbage in the result. I managed to hook Opus in OmegaT & Trados, get lots of "@ @ "... Tested with browser: same bad result.

I tested with different versions of the plugin, and engine... always the same.

Am I missing something ? Any ideas anyone ?

Thank you. 2022-1020_11;06;06__Mozilla Firefox 2022-1020_11;10;12__OPUS-CAT MT Engine v1 2 0 0 2022-1020_11;28;36__OmegaT 5 7 1 __ omega_skoda

claude-ws01 commented 2 years ago

Just couldn't stop my brain from juggling with that issue... (since it did work once with OmegaT)... so I came back to the "workbench" and here's what I did, step by step.

** My current conclusion. Deleting the "c_users_admin_local...opus folders" and restart resolved the issue... for now. (time will tell)

** Note:

** Suggestion improvements:

Note, I know a lot of it is in the documentation... spreaded here and there (you have to admit), still these are details that are time savers if included as I suggest.

Note 2, I (I) would have benefited from readiing your debug procedure/tools when it come to plugins. (ie, I was getting good results in the tranalstion tab of OPUS, but not in trados...)(how do you debug that???)

That's it! Relieved that it now works... because when it works, hell it works nice.

Opus is a awesome tool... be proud. :)

claude-ws01 commented 2 years ago

Post note. My issue is resolved. "thank god" as would say my mexican friends. Kind regards Claude.

claude-ws01 commented 2 years ago

I thought my issue was solved, but no.

Finally, pinpointed the issue: the 2019 model files.

The issue can be reproduced by adding a 2019 model.

(at first I downloaded what I could find on opus, then went to tatoeba, that's why I had 2019 & 2020 model files)

Furthermore, got a InvalidOperationException every time I attempted to "delete selected model", Opus crash, then I re-run, then "deleted" again the same model without error. But no issues at all deleting non-2019 models. Included the log file.

opuscat_log_DELETING-MODEL.txt

I leave the ticket open for you to see. cheers.

TommiNieminen commented 2 years ago

Thanks for your thorough testing, I'll keep this open as an enhancement issue, since the UI fixes you mention should be fairly simple to implement.

The root cause of the garbage output seems to be a fix I made in v1.1.0.8 to get rid of batch file post-processing (this was causing problems to some users). Unfortunately this broke some older models, which used BPE subword segmentation. All the newer models use SentencePiece subword segmentation, and since I've only done testing on them recently, this bug went unnoticed. I will either fix it or remove the BPE models from the model download list. A workaround is to use only the newest models, since I think all language pairs have SentencePiece models available for them now.

claude-ws01 commented 2 years ago

Thx for your reply, genuinely appreciated.

Regarding downloading models...

Some thoughts of Enhancements:

(where I am, in a small village in the "3rd world", I'm limited to 3Mbps. With the DL size, I may decide for a more appropriate time to download.)

BTW, really happy with Opus, it works really great, with great results. A jewel. :) Take care. Claude.