Closed HMueller007 closed 1 year ago
It needs at least 1000 TUs
Hi, the finetuning needs a bit of data to work on, so there's a minimum requirement of 1000 translation units (pairs of source and target language segments). This is an arbitrary number, and you probably need more than 1000 to have a noticeable effect. If you still want to try it with 600 translation units, you can change the FinetuningSetMinSize setting in the OpusCatMTEngine.exe.config file.
Hi, thanks for the answers @all. I actually tried instead the function to upload a source and a corresponding target file derived from the same TM and it worked, it improved the translations even with this small size. But I might also try this other setting, thank you.
Hello HMueller007 and all
What I sometimes do to get around this is to import a simple two column TB (glossary) into memoQ for the same job as the translation job I'm doing and then export all that to the TB for the same job. The segments are small of course but they are very relevant to the job and as Opus does not have any TB function at present to instruct the MT engine, this feels like an intuitive way to proceed. This often gets the TB to exceed the minimum number of segments restriction setting
@SafeTex That's a good tip, will try this, thanks.
Hi,
when I want to fine-tune the model with a TMX from a (Wordfast) project it says: "not enough parallel segments in the TMX".
It has more than 600 bilingual segments (so about 1300 segments in total if you count source and target language segments separately) from a finished project. Is this really not enough? How many do you need?