clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

different number of notes in joint and single corpora #717

Closed matyaskopp closed 11 months ago

matyaskopp commented 1 year ago

I have discovered an inconsistency in notes between ParlaMint-XX and ParlaMint-ES-GA:

Many notes are missing in the "joint corpora" (ParlaMint-XX)

TomazErjavec commented 1 year ago

The missing notes etc. are ok in TEI, but it seems I lose them here: https://github.com/clarin-eric/ParlaMint/blob/a726ea51017fe5d00d797fe82099a3da88c88e07/Scripts/parlamint2xmlvert.xsl#L127-L133. Will look into it once ParlaMint-en 3.0 is released.

TomazErjavec commented 11 months ago

Hm, the parlamint2xmlvert.xsl has changed a lot in the meantime, and I think I also addressed this issue. However, note (sic!) that it doesn't really matter, as the concordancer looses a lot of notes anyway, as they are empty elements, and if more than one follows a token, only one is retained.

So I will close this, but if @matyaskopp you feel it is an issue, please reopen in Future.