In the BE corpus there are 723 paragraphs (segments) that have @xml:lang="en", even though - at least the ones I've checked - are not in fact in English - they are typically lists of names.
This is currently causing problems for making the MTed corpus, as the source for MT were CoNLL-U files split into fr and nl, which now need to be joined to make the complete MTed corpus. However, the so called "en" paragraphs are neither in fr nor nl CoNLL-U, so they were not translated and are missing from the MT output. I will somehow find a workaround for this but the errors should be fixed for 3.1
Below the list of segments in "English":
In the BE corpus there are 723 paragraphs (segments) that have
@xml:lang="en"
, even though - at least the ones I've checked - are not in fact in English - they are typically lists of names. This is currently causing problems for making the MTed corpus, as the source for MT were CoNLL-U files split into fr and nl, which now need to be joined to make the complete MTed corpus. However, the so called "en" paragraphs are neither in fr nor nl CoNLL-U, so they were not translated and are missing from the MT output. I will somehow find a workaround for this but the errors should be fixed for 3.1 Below the list of segments in "English":