EleanorJiang / BlonDe

Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus
MIT License
68 stars 7 forks source link

Questions about sentence-level alignment of BWB #1

Open KindAdvisor opened 1 year ago

KindAdvisor commented 1 year ago

Hi, Thanks for your fine work, and new large-scale document-level data. But when I do sentence-level alignment using the " \ " symbols as delimiters, it doesn't quite align well. I try to check the file in training data, for example, in the first line of 282.enu/chs. Chapter 1 - Birthday Gift <sep> 001- Birthday gift Lu Mingshu would always remember her seventh birthday. <sep> **** 陆明舒一直记得七岁生辰那天。 <sep> **** "Chapter 1 - Birthday Gift" 's translations do not appear in Chinese article.

Looking forward for your reply Thanks!

EleanorJiang commented 1 year ago

Hey! You may need to do some cleaning with the training data using Bleualign (https://github.com/rsennrich/Bleualign). Hope it helps!

fengshi-cherish commented 1 year ago

it seems we need machine translation of the source language to the target language?