I was able to finetune the base korean model using TMX data by editing the finetune recipe, but now I am having issues with the model. When I changed the finetune recipe, I found the filter-korean.sh script and substituted the steps:
That seemed to do the trick to kick of the tuning, however with the new tuned model I am having a punctuation issue. If you send something like this (it would be in korean but for the sake of explaining i did it in english):
hello my name is heather.
-heather is here to say hello,
*how are you today?
where the trailing character before the new line is punctuation & the first character of the next line is a punctuation followed directly by a character, the translation comes out incorrect and the punctuation / new lines is off. I did notice, if you send in each line individually, then the translations come out correctly and no punctuation issues are present. It seems as though the spaces/punctuation is causing the text to be interpreted as a sentence and therefore affecting the translation. I looked through the backlog and noticed there were some initial issues with Korean, so I figured I would ask and see if you had any insight on what the issue may be.
I was able to finetune the base korean model using TMX data by editing the finetune recipe, but now I am having issues with the model. When I changed the finetune recipe, I found the filter-korean.sh script and substituted the steps:
with the following:
That seemed to do the trick to kick of the tuning, however with the new tuned model I am having a punctuation issue. If you send something like this (it would be in korean but for the sake of explaining i did it in english):
where the trailing character before the new line is punctuation & the first character of the next line is a punctuation followed directly by a character, the translation comes out incorrect and the punctuation / new lines is off. I did notice, if you send in each line individually, then the translations come out correctly and no punctuation issues are present. It seems as though the spaces/punctuation is causing the text to be interpreted as a sentence and therefore affecting the translation. I looked through the backlog and noticed there were some initial issues with Korean, so I figured I would ask and see if you had any insight on what the issue may be.
Thank you!