Open SafeTex opened 1 year ago
These are proper nouns that have probably never occurred in the training material, so the NMT system has no clear examples on how to handle them. Ideally the system still learns to identify unseen proper nouns (probably based on features such as capitalization and certain trigger words) and also learns to copy them into the translation in the same form. But the process is fuzzy (by necessity, since proper noun translation is pretty fuzzy, consider e.g. organization names that ARE translated, like the UN etc.) Here the model has learnt a weird mixed behavior, where it corrupts the proper noun while still keeping it in Swedish.
Some kind of named entity recognition combined with an option where you could specify whether entities need to translated or copied into the translation might be a good idea, I'll mark this as a potential improvement (it also has some synergies with the terminology support).
Hello Tommi and all
Just in case you don't know, memoQ also has a "non translatable" feature that is separate from its TB (termbase) I'm going to send you a non translatable file so you can see its structure. Ideally, it would be great if Opus could handle such files rather than translators adding "non translatable" terms to Opus one by one.
I know that's asking a lot (again) but if I don't mention it and send you such a file, then there's even less chance of Opus being able to handle such a file.
But as it's a text file, I guess that translators could remove the header and tags if that is what it takes to load such a file in one go into Opus
Regards Dave
Hello Tommi and all
In my present job with a lot of Swedish proper nouns for organizations, associations etc. Fiskmö changes words that it can't understand but does not actually translate them, as in:
While I can understand that if the MT engine could translate say 90% of such proper nouns, it might be programmed or tempted to do so, but it's much more debatable here, as fFIskmö has not translated any part of the word(s).
Would it not be better for Fiskmö to leave the word then? On what basis does it change a word without ever translating it? It seems strange, especially in the second example, "Guldsmedsbranschens Leverantörsförening" > "Goldsmedsbrakensförening," for reasons evident to you as a Swedish speaker.
What do you make of this Tommi and others please?
Thanks