clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
50 stars 53 forks source link

Too many xml:lang in factorised files #686

Closed TomazErjavec closed 1 year ago

TomazErjavec commented 1 year ago

It seems that the factorise script puts xml:lang on all elements in factorised files. I vaguelly remember we discussed this as a feature, and it might be in the taxonomies, but isn't in the listOrg or listPerson as:

Is it possible not to do this, or will it cause problems? If yes, maybe add-common content can remove them again...

matyaskopp commented 1 year ago

@TomazErjavec Can you give me a sample of the corpus where it makes a mess? I believe I only change the root language this way

https://github.com/clarin-eric/ParlaMint/blob/53c4c1974152e7ff8e512af24f73460fa5fa055d/Scripts/parlamint-factorize-teiHeader.xsl#L102-L107

matyaskopp commented 1 year ago

now I see the issue - problem is in copying already factorized files: image

matyaskopp commented 1 year ago

I don't know the reason for copying lang in XInclude mode - I use this template/mode for copying xml files...

https://github.com/clarin-eric/ParlaMint/blob/8ca9f886ee701f8770ef1036e4fec2b29ea98b62/Scripts/parlamint-lib.xsl#L130-L153

@TomazErjavec, Should I implement my own mode copyXInclude which does not add xml:lang, or this template can be fixed? My own mode will be probably safer, but I prefer less code duplication if possible...

TomazErjavec commented 1 year ago

@matyaskopp thanks for the fix.

don't know the reason for copying lang in XInclude mode

XInclude mode is meant only for fully expanding the corpus header, so that the variable containing it can be then used by templates that need corpus header metadata; the reason for having lang everywhere there is to make it simpler to know which language a certain element contains (without the clumsy ancestor-of-self::tei:*[@xml:lang][1]/@xml:lang idiom).

Of course, all this was only in my head, but nowhere explained, so, sorry about that!

All ok now, closing.