Closed matyaskopp closed 1 year ago
Now I see: https://github.com/clarin-eric/ParlaMint/blob/819add4ccdecad8faac712b22c618002ac76b6e7/Scripts/parlamint2distro.pl#L131 https://github.com/clarin-eric/ParlaMint/blob/819add4ccdecad8faac712b22c618002ac76b6e7/Scripts/parlamint2final.xsl#L21
parlamint2final is not calculating tagUsage
tagUsage
calculation is implemented in
https://github.com/clarin-eric/ParlaMint/blob/819add4ccdecad8faac712b22c618002ac76b6e7/Scripts/parlamint-add-common-content.xsl#L12
which is not used in the finalization
I thought everybody computes their tagUsages, but notied AT a couple of day ago myself. I now inserted your calculation into finalize but it is a doomed effort, because I change the countable markup for ES-GA and now also IS (names without words, a but which floated to the top only in the MTed corpus), hm. I guess we should do my fixings first, and then just use add-common (although my version of add-common does things yours doesn't:). Would you dare try it, or is that too much to hope for, I'm afraid of introducing even more confusion! Or maybe we live with the fact that tagusages will be slightly off for 3.0, and hope to do better in 3.1?
Discussion on this continues in #675, closing this one.
AT corpus has wrong numbers in
tagUsage
in/project/corpora/Parla/ParlaMint/ParlaMint-full/Data/Corpora
folder:All corpus files look like this: https://github.com/clarin-eric/ParlaMint/blob/392e2ee930e764d09045ea0e827de6c57d2afe2c/Data/ParlaMint-AT/ParlaMint-AT.xml#L112-L128
And component files: https://github.com/clarin-eric/ParlaMint/blob/392e2ee930e764d09045ea0e827de6c57d2afe2c/Data/ParlaMint-AT/ParlaMint-AT_2005-03-31-022-XXII-NRSITZ-00100.xml#L110-L126
I guess that the finalization script does not calculate these numbers and only
AT
set1
into component files