Open KindAdvisor opened 1 year ago
Hey! You may need to do some cleaning with the training data using Bleualign (https://github.com/rsennrich/Bleualign). Hope it helps!
it seems we need machine translation of the source language to the target language?
Hi, Thanks for your fine work, and new large-scale document-level data. But when I do sentence-level alignment using the " \ " symbols as delimiters, it doesn't quite align well.
I try to check the file in training data, for example, in the first line of 282.enu/chs.
Chapter 1 - Birthday Gift <sep> 001- Birthday gift Lu Mingshu would always remember her seventh birthday. <sep> **** 陆明舒一直记得七岁生辰那天。 <sep> ****
"Chapter 1 - Birthday Gift" 's translations do not appear in Chinese article.Looking forward for your reply Thanks!