BlonDe and BWB are developed for document-level machine translation. BlonDe is an automatic evaluation metric that explicitly tracks discourse phenomena. BWB is a large-scale bilingual parallel corpus that consists of web novels.
We hope that they will serve as a guide and inspiration for more work in the area of document level machine translation.
📐 BlonDe
📙 BWB: Bilingual Web Book Dataset
Please see release logs for older updates.
If you use the BlonDe package or the BWB dataset for your research, please cite:
title="{BlonDe}: An Automatic Evaluation Metric for Document-level Machine Translation",
author="Yuchen Eleanor Jiang and Tianyu Liu and Shuming Ma and Dongdong Zhang and Jian Yang and Haoyang Huang and Rico Sennrich and Ryan Cotterell and Mrinmaya Sachan and Ming Zhou",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "",
doi = "10.18653/v1/2022.naacl-main.111",
pages = "1550--1565",
title="Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus",
author="Yuchen Eleanor Jiang and Tianyu Liu and Shuming Ma and Dongdong Zhang and Ryan Cotterell and Mrinmaya Sachan",
booktitle = "Proceedings of the 2023 Conference of the Association for Computational Linguistics: Human Language Technologies",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "",
doi = "10.18653/v1/2023.main.111",
pages = "1550--1565",
Standard automatic metrics, e.g. BLEU, are not reliable for document-level MT evaluation. They can neither distinguish document-level improvements in translation quality from sentence-level ones, nor identify the discourse phenomena that cause context-agnostic translations.
BlonDe is proposed to widen the scope of automatic MT evaluation from sentence to the document level. It takes discourse coherence into consideration by categorizing discourse-related spans and calculating the similarity-based F1 measure of categorized spans.
As shown in the figure, BlonDe is a lot more selective than BLEU for document-level MT and shows a larger quality difference between human and machine translations.
In the BlonDe package, there are:
: the main metric, combining dBlonDe
with sentence-level measurementdBlonDe
: measures the discourse phonomena (entity
, tense
, pronoun
, discourse markers
: takes human annotation (annotated ambiguous/ommitted phrases and manually-annotated NER) into considerationPython>=3.6 only
Before you install blonde
, make sure that
your pip
, setuptools
and spacy
are up to date, and en_core_web_sm
is downloaded.
pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download en_core_web_sm
Install the official Python module from PyPI:
pip install blonde
Install the latest unstable version from the master branch on Github:
pip install git+
Install from the source:
git clone
cd BlonDe
pip install .
and you may test your installation by:
python -m unittest discover
We provide a command line interface (CLI) of BlonDe as well as a python
We provide example inputs under ./example
You can use it as follows for the simplest usage:
blonde -r example/ref.txt -s sys.txt
To use human-annotated spans for BlonDe+
add -p
and provide the annotation file path with -an
, as in:
blonde -r example/ref.txt -s sys.txt -p -an example/an.txt
To use human-annotated named entities (instead of automatic detected ones),
add -p
and provide the named entity file path with -ner
, as in:
blonde -r example/ref.txt -s sys.txt -p -ner example/ner.txt
General arguments:
-h, --help show this help message and exit
reference file path(s), each line is a sentence
-s SYSTEM, --system SYSTEM
system file path, each line is a sentence
--version, -V show program's version number and exit
BlonDe-related arguments:
The categories of BLONDE.
Default: ('tense', 'pronoun', 'entity', 'dm', 'n-gram')
--average-method {geometric,arithmetic}, -aver {geometric,arithmetic}
The average method to use, geometric or arithmetic.
Defaults: geometric
--smooth-method {none,floor,add-k,exp}, -sm {none,floor,add-k,exp}
Smoothing method: exponential decay, floor (increment zero counts), add-k (increment num/denom by k for n>1), or none.
Default: exp
--smooth-value SMOOTH_VALUE, -sv SMOOTH_VALUE
The smoothing value. Only valid for floor and add-k.
Defaults: floor: 0.1, add-k: 1
--lowercase LOWERCASE, -lc LOWERCASE
If True, enables case-insensitivity. Default: True
Weight-related arguments:
--override-weights, -w
Whether to customize the weights of BLONDE
--reweight, -rw Whether to reweight the weights of BLONDE to 1
The weights of TENSE (verb types), should be a tuple of length 7, corresponds to ('VBD', 'VBN', 'VBP',
'VBZ', 'VBG', 'VB', 'MD'). Defaults: (1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7). Only valid when `override_weights`
is used
The weights of PRONOUN, should be a tuple of length 4, corresponds to ('masculine', 'feminine', 'neuter',
'epicene'). Defaults: (0.5, 0.5, 0, 0). Only valid when `override_weights` is used
The weights of PERSON and NONPERSON entities, Defaults: (1/2, 1/2). Only valid when `override_weights` is
The weights of DISCOURSE MARKER, should be a tuple of length 5, corresponds to ('comparison', 'cause',
'conjunction', 'asynchronous', 'synchronous'). Defaults: (0.5, 0.5, 0, 0). Only valid when
`override_weights` is used
BlonDe+ related arguments, annotation required:
--plus, -p Whether to add BLONDE PLUS categories. If so, please provide annotation files that are in the required
Annotation file path, each line is the annotation corresponding a sentence. See README for annotation format
--ner-refined NER_REFINED, -ner NER_REFINED
Named entity file path, each line is the named entities corresponding a sentence. If provided, the annotated
named entities instead of the automated recognized ones are used in BLONDE. See README for named entity
annotation format
The categories that your annotation files contain, Defaults: ('ambiguity', 'ellipsis'). Only valid when
`plus` is used
The corresponding weights of plus categories, should be in the same length as `plus_categories`. Defaults:
(1, 1). Only valid when `plus` is used
Following SacreBLEU, we also recommend users to use the object-oriented API, by creating an instance of the BLONDE
A detailed example is provided in
object:from blonde import BLONDE
blonde = BLONDE()
score = blonde.corpus_score([sys_doc], [[ref_doc_1], [ref_doc_2], ...])
where sys_doc
, ref_doc_1
and ref_doc_2
are List[str]
score = blonde.corpus_score(sys_corpus, [ref_corpus_1, ref_corpus_2, ...])
where sys_corpus
, ref_corpus_1
and ref_corpus_2
are List[List[str]]
blonde = BLONDE(references=[ref_corpus]) # for faster recomputation
score = blonde.corpus_score(sys_corpus)
blonde_plus = BLONDE(references=[ref_corpus],
score = blonde_plus.corpus_score(sys_corpus)
weight_normalize: bool = False,
average_method: str = 'geometric',
categories: dict = CATEGORIES,
plus_categories=None, # ("ambiguity", "ellipsis")
plus_weights=(1, 1),
lowercase: bool = False,
smooth_method: str = 'exp',
smooth_value: Optional[float] = None,
effective_order: bool = False,
references: Optional[Sequence[Sequence[Sequence[str]]]] = None,
annotation: Sequence[Sequence[str]] = None,
ner_refined: Sequence[Sequence[str]] = None)
Parameter | Description ! |
categories | A dict where the keys are chosen from ('tense', 'pronoun', 'entity', 'n-gram') , and the keys are the names of features in different categories, Dict[str, Sequence[str]]. If None , ('tense', 'pronoun', 'entity', 'n-gram') are included. |
weights | The weights of the aerformentioned features, Dict[str, Sequence[float]]. If None , uniform weights are adopted. |
plus_categories | The human annotated categories, e.g. ('ambiguity', 'ellipsis') (default: None ) |
plus_weights | The weights of the human annotated categories (default: None ) |
weight_normalize | Whether to reweight to 1 (default: False ) |
lowercase | If True , lowercased BLONDE is computed. |
average_method | The average method to use. Choose from ('geometric', 'arithmetic') . |
smooth_method | The smoothing method to use. Choose from ('floor', 'add-k', 'exp' or 'none') . |
smooth_value | The smoothing value for floor and add-k methods. None falls back to default value. |
max_ngram_order | If given, it overrides the maximum n-gram order (default: 4 ). |
effective_order | If True , stop including n-gram orders for which score is 0. This should be True , if sentence-level BLONDE will be computed. |
references | A sequence of reference documents with document being defined as a sequence of reference strings. If given, the reference n-grams and lengths will be pre-computed and cached for faster BLONDE computation across many systems. |
The BWB dataset is a large-scale document-level Chinese--English parallel dataset. It consists of Chinese online novels across multiple genres (sci-fi, romance, action, fantasy, comedy, etc.) and their corresponding English translations crawled from the Internet. The novels are translated by professional native English speakers, and are corrected by editors.
To the best of our knowledge, this is the largest document-level translation dataset to date.
Train | Test | Dev | Total | |
#Docs | 196,304 | 80 | 79 | 196K |
#Sents | 9,576,566 | 2,632 | 2,618 | 9.58M |
#Words | 325.4M | 68.0K | 67.4K | 460.8M |
The test set of BWB is annotated. For each document, there are:
: the original Chinese document. Each line is a sentence.ref_re.txt
: the reference English document. Each line is a sentence.ner_re.txt
: the named entities that appear in each sentence and their counts in the
: the error type, along with the spans that may cause ambiguity
or ellipsis
.Error Types:
Error Type | #id | Description | With Span Annotation |
ambiguity |
1 | There is(are) some ambiguous term(s) that is(are) correct in the stand-alone sentence but wrong in context. | :heavy_check_mark: |
ellipsis-pronoun |
2 | There is(are) error(s) caused by the omission of pronouns. | :heavy_check_mark: |
ellipsis-other |
3 | There is(are) error(s) caused by the omission of other phrases. | :heavy_check_mark: |
named entity |
4 | There is(are) error(s) due to the mistranslation of named entities. | |
tense |
5 | There is(are) error(s) due to tense. | |
sentence-level |
6 | There is(are) sentence-level error(s). |
For better reproducing our results, we also provide:
: the MT output we use in discourse error analysis. Each line is a sentence.pe_re.txt
: the human post-editing on provided the MT output by professional translators. chs_re.txt
【川流不息:乔恋,快看微博头条! 微博头条?
剧组发布会。 沈凉川应邀出场,导演立马恭敬地迎接过来,客气的跟他说这话,表达着自己对他能够到来的谢意。
剧组根本就没有邀请王文豪,可他却不知道从哪里拿到了邀请函,自己堂而皇之的进来了。 他当然要进来了。
沈凉川穿着一身深灰色西装,面色清冷,手里端着一个高脚香槟杯,站在桌子旁边,整个人显得格外俊逸,却也格外的清冷,让周围的人都不敢上前搭讪。 他一个人,就是一个世界。
“沈哥,您到底是要干什么啊? 能不能告诉我,好让我有个心理准备。 您这样突然跑过来参加这么一个小剧组的发布会,又什么都不说就这么杵着,我心里瘆的慌。”
宋城的心都提了起来,紧跟在他身后。 沈凉川一步一步往前,走到了前方。
“对啊,现在的狗仔就是惹人厌恶,我早就想动手教训他们了! “你这样,就不怕跟他们结仇啊?”
“我都这样了,我怕什么? 当初沈影帝以正当防卫为借口,将一名狗仔打了,告到了法庭上去不也不了了之吗?
王文豪说到这里,嘿嘿一笑。 还想说什么,忽然察觉到身后有人靠近。
Qiao Lian clenched her fists and lowered her head.
Actually, he was right.
She was indeed an idiot, as only an idiot would believe that they could find true love online.
She curled her lips and took a deep breath. Just when she was about to put down her cell phone, a barrage of posts bombarded her WeChat account.
She logged into her account and saw that a large number of fans in the Shen Liangchuan fan group had tagged her.
[Qiao Lian: What happened?]
[Chuan Forever: Qiao Lian, look at the headlines on Weibo, quickly!]
She froze momentarily, then picked up her cell phone and logged into Weibo. When she saw the headlines, her entire body immediately froze over again!
Shen Liangchuan arrived at the scene after accepting the invitation. The director immediately went to greet him in a respectful manner, politely welcoming him and expressing his gratitude for Shen Liangchuan’s presence today.
Shen Liangchuan did not speak. Instead he looked at Wang Wenhao, who was nearby.
After Wang Wenhao’s scandal broke, every film he starred in had been taken down. Only this show could still be broadcasted, as Wang Wenhao had a supporting role in it and was practically unnoticeable.
In fact, the cast and crew hadn’t even invited Wang Wenhao. However, he had obtained a copy of the invitation letter somehow, and strode imposingly into the venue anyway.
After all, this was his final chance.
After his scandals broke, practically every advertiser and filming crew wanted to break their contracts with him.
He would rather take a supporting role than fade out into obscurity.
That was because the scandals surrounding him would never disappear.
Thus, Wang Wenhao went around trying to curry favor with everybody at this press conference.
Shen Liangchuan was wearing a dark grey suit and he had a cold expression. He was holding a champagne glass and was currently standing beside a table. He looked exceptionally stylish, but also exceptionally icy. As a result, none of the people around him dared to approach him.
If anyone had paid attention to him, they would have noticed that his gaze kept drifting over to Wang Wenhao.
Song Cheng stood at his side. After noticing his behavior, he could not help but pinch his arm.
Shen Liangchuan turned around and looked at him casually, with a questioning face.
“Brother Shen, what are you planning to do? Can you tell me beforehand so that I can prepare myself mentally. You suddenly decide to come and attend such a small-scale press conference, yet you have been completely silent and are now just standing here and doing nothing? My heart is beating anxiously right now.”
After Shen Liangchuan heard him speak, he sipped a mouthful of champagne and put the glass down.
Then, he walked away in long strides.
Song Cheng was extremely nervous and followed him. Shen Liangchuan walked forward, one step at a time, until he reached the front of the room.
Wang Wenhao was currently ingratiating himself with a C-list celebrity. The celebrity asked, “Hey, I heard that you beat a paparazzi?”
“Yeah, the paparazzi nowadays are so disgusting. I have wanted to teach them a lesson myself for some time now!” "Are not you afraid of becoming an enemy of them?"
“I’ve already done it, so what should I be scared of? That time Best Actor Shen beat up a reporter, he claimed that it was in self-defence so that he would have an excuse if he got sued, right? At that time, nobody said anything”
As Wang Wenhao spoke, he laughed heartily. Just as he was about to continue speaking, he suddenly felt a presence approaching him from behind.
He turned around and saw Shen Liangchuan. His eyes narrowed and attempted to smile at him. However, Shen Liangchuan took a step forward, grabbed his tie and threw a punch at his face!
PERSON: (Qiao Lian: 1; ) NONPERSON: ()
PERSON: () NONPERSON: (WeChat: 1; )
PERSON: (Shen Liangchuan: 1; ) NONPERSON: ()
PERSON: (Qiao Lian: 1; ) NONPERSON: ()
PERSON: (Qiao Lian: 1; ) NONPERSON: (Weibo: 1; )
PERSON: () NONPERSON: (Weibo: 1; )
PERSON: (Shen Liangchuan: 1; ) NONPERSON: (Shen Liangchuan’s: 1; )
PERSON: (Shen Liangchuan: 1; Wang Wenhao: 1; ) NONPERSON: ()
PERSON: (Wang Wenhao: 1; ) NONPERSON: (Wang Wenhao’s: 1; )
PERSON: (Wang Wenhao: 1; ) NONPERSON: ()
PERSON: (Wang Wenhao: 1; ) NONPERSON: ()
PERSON: (Shen Liangchuan: 1; ) NONPERSON: ()
PERSON: (Wang Wenhao: 1; ) NONPERSON: ()
PERSON: (Song Cheng: 1; ) NONPERSON: ()
PERSON: (Shen Liangchuan: 1; ) NONPERSON: ()
PERSON: (Shen Liangchuan: 1; ) NONPERSON: ()
PERSON: (Song Cheng: 1; Shen Liangchuan: 1; ) NONPERSON: ()
PERSON: (Wang Wenhao: 1; ) NONPERSON: ()
PERSON: (Wang Wenhao: 1; ) NONPERSON: ()
PERSON: (Shen Liangchuan: 2; ) NONPERSON: ()
Joe clenched his fist and bowed his head. 4,Joe <pos/1,3> 3,her, his <pos/35,37>
In fact, he's right. 2 5
I am a fool, even will believe the love on the Internet. 2 5
She ticked her lips, took a deep breath, and was about to put her phone down, but weChat was blown open. 6 1,bombarded, blown open <pos/94,103>
She nodded in and found it was a cold powder group, and everyone was on her. 1,logged, nodded <pos/5,10> 4,cold powder group <pos/34,50>
Joe: What's the matter? 4,Joe <pos/1,3>
Chuan-flowing: Joe love, quickly look at the micro-blogging headlines! Weibo headlines? 4,Chuan-flowing <pos/1,13>;Joe love <pos/18,25>;micro-blogging <pos/50,63>
She took a slight look, picked up the phone, landed on the micro-blog, when she saw the headlines, the whole person suddenly choked! 1,logged, landed on <pos/46,54> 6
The show's release. Shen Liangchuan was invited to appear, the director immediately greeted him with respect, politely said this to him, expressed his gratitude for his arrival. 0
Shen Liangchuan did not speak, look not far from Wang Wenhao. 6
After Wang Wenhao's accident, all the works were off the shelves, and this play can also be broadcast, because Wang Wenhao in the friendship played by the male no. 3 play is very few, almost negligible. 1,taken, off the shelves <pos/50,64> 5 6
The crew did not invite Wang Wenhao, but he did not know where to get the invitation, his own entrance. Of course he's coming in. 5
This is his last chance. 5
The scandal broke, and almost all advertisers and crews broke his contract with him. 2 6
He would rather shoot the men's number three now than be silent about it. 1,fade, be silent about <pos/55,69> 6
Because of his affairs, there is no pressure. 5 6
So Wang Wenhao at the press conference, everywhere to please others. 6
Shen Liangchuan wearing a dark gray suit, cold-faced, hand with a high-footed champagne glass, standing next to the table, the whole person appears extra or less handsome, but also extraordinarily cold, so that people around are afraid to come forward. He is a man, is a world. 1,very, extra or less <pos/149,161>;icy, cold <pos/200,203> 5
But if you can notice him, you will find his sight, but always if there is nothing floating to Wang Wenhao body. 6 5 7
Songcheng stood by his side, aware of this, can not help but pull his arm. 5
Shen Liangchuan faint lying back, looked at him, blind inquiry. 6 7
"Shen brother, what the hell are you doing? Can you tell me so that I have a mental preparation. You suddenly ran over to attend the launch of such a small group, and said nothing so, I panicked. " 1,press, launch <pos/135,140>
Shen Said, took a sip of champagne, and then put the champagne glass down. 6
Immediately, he took a slender step. 1,long stride,slender step <pos/24,35>
Songcheng's heart was raised and followed immediately behind him. Shen Liangchuan step by step forward, walked forward. 1,nervous, raised <pos/23,28> 6
Wang Wenhao is with other third-rate star-studded sets, the man asked, "I heard you hit a paparazzi?" " 6 5
"Yeah, the paparazzi are disgusting now, I've wanted to teach them! "You're not afraid to feud with them, " he said. 0
"I'm all there, what am I afraid of? At the beginning Shen Shadow Emperor to self-defense as an excuse, a paparazzi hit, to the court to go to the court is not it"? 4,Shen Shadow Emperor <pos/56,74> 5 7
Wang Wenhao said here, hey hey smile. Want to say something, suddenly realized that someone behind him is close. 3,He, Want to <pos/39,45> 6 1,approaching, close <pos/109,113> 5 7
He looked back, he saw Shen Liangchuan, eyes shrink, licking his face and smiling, but saw Shen Liangchuan a step forward, a holding his collar, and then a fist to his face hit! 3,his, eyes <pos/41,44> 6 7
Download the BWB dataset from this Google Drive link.
Please note that use of this dataset constitutes your binding acceptance of the Terms of Use.
You can find the full, legally binding document here: Terms of Use.