Closed matyaskopp closed 1 year ago
Dear Matyáš,
I think the issues are ok now: href was corrected and the root file for the annotated version ".ana.xml" was created.
The last issue, regarding UDpipe, we don't run the whole corpus annotation, so let me know how we should proceed to annotate the whole corpus.
Best, m
The last issue, regarding UDpipe, we don't run the whole corpus annotation, so let me know how we should proceed to annotate the whole corpus.
ok, closing issue. I will add .ana
suffix into ufal/ParCzech/src/udpipe2/udpipe2.pl:
The Basque corpus did not pass the validation, so please do not send full data.
Easy see the issues are:
[x] wrong
href
content when including component files: https://github.com/miruskieta/ParlaMint/blob/5d5b063ec9c37a57b4134afea5cd150109a8d950/Data/ParlaMint-ES-PV/ParlaMint-ES-PV.xml#L3204should be
[x] missing root file for an annotated version
[ ] ufal/ParCzech/src/udpipe2/udpipe2.pl do not change ids and filenames (I am doing it with postprocessing when NER annotation is done). So you have to add
.ana
into ids. Or, if you haven't run the whole corpus annotation, I can add this option to the script - it is easy.