Closed TomazErjavec closed 1 year ago
@nuriabel, and also make sure that you are using the correct model for linguistic annotation
Hi Tomaz and Matyas, Yes, I see that we need to redo a number of things for version 3.1. Thanks for all your analysis!! N.
El jue, 22 jun 2023 a las 11:44, Matyáš Kopp @.***>) escribió:
@nuriabel https://github.com/nuriabel, and also make sure that you are using the correct model for linguistic annotation
— Reply to this email directly, view it on GitHub https://github.com/clarin-eric/ParlaMint/issues/697#issuecomment-1602333961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFJPVZA4HOMIIRG33KQL6TXMQHYJANCNFSM6AAAAAAZP5UAEE . You are receiving this because you were mentioned.Message ID: @.***>
@nuriabel, pls. heads up on this issue, we need to finalize the corpora.
Dear Tomaz, Rodolfo has solved most of the problems you found. He is now working with the wrong joint right. I'll make sure that the wrong lang=en are also corrected for the next delivery. Tomorrow I'm officially back from holidays, and this is in my priority list.
Great, thank you! Looking forward to you polished corpus! :)
This, as far as I can see, has been fixed, thanks. Closing the issue.
In the ES-CT corpus there are 18 paragraphs that have
@xml:lang="en"
, even though at least the ones I've checked are not in fact in English. This is currently causing problems for making the MTed corpus, as the source for MT were CoNLL-U files split into ca and es, which now need to be joined to make the complete MTed corpus. However, the so called "en" paragraphs are neither in ca nor es CoNLL-U, so they were not translated and are missing from the MT output. I will somehow find a workaoround for this but the errors should be fixed for 3.1 Below the list of "en" paragraphs: