Closed matyaskopp closed 1 year ago
@TomazErjavec ParlaMint/Scripts/parlamint-init-taxonomy.xsl can be used for taxonomy normalization (languages order: en, other langs in alphabetical order) and to make sure everything is translated in common taxonomy.
this creates normalized common taxonomies in Data/ParlaMint-TESTTAXONOMY
mkdir Data/ParlaMint-TESTTAXONOMY
make initTaxonomies-TESTTAXONOMY \
PARLIAMENTS="TESTTAXONOMY" \
LANG-CODE-LIST="bg bs ca cs da de el en es es es et eu fi fr fr gl hr hu is it lt lv nl nl no pl pt ro ru sl sr sv tr uk"
if translation is missing, then the taxonomy is invalid, because the terms do not contain text:
<?xml version="1.0" encoding="UTF-8"?>
<taxonomy xmlns="http://www.tei-c.org/ns/1.0"
xml:id="ParlaMint-taxonomy-parla.legislature"
xml:lang="mul">
<desc xml:lang="en">
<term>Legislature</term>
</desc>
<desc xml:lang="bg">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="bs">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="ca">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="cs">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="da">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="de">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="el">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="es">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="es">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="es">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="et">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="eu">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="fi">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="fr">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="fr">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="gl">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="hr">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="hu">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="is">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="it">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="lt">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="lv">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="nl">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="nl">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="no">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="pl">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="pt">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="ro">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="ru">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="sl">
<term>Zakonodajna oblast</term>
</desc>
<desc xml:lang="sr">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="sv">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="tr">
<term><!--Legislature--></term>
</desc>
<desc xml:lang="uk">
<term><!--Legislature--></term>
</desc>
Thanks for the explanation. I'm not sure if you are aware (I certainly forgot) that we have something similar already, i.e. parlamint-merge-taxonomy
So, I am not quite sure about the usage scenario of one versus the other. However, this can be resolved post 3.0.
Scenario:
Following errors then can arise:
An invalid sample is better motivation to fix it than some error in the log. Process merge-init is repeated until taxonomy is not ok.
Improve script for initializing taxonomies: https://github.com/clarin-eric/ParlaMint/blob/ce00fda7c8210f3cd7a709d8fa77998aac6708b4/Scripts/parlamint-init-taxonomy.xsl#L1-L9
If translation for a particular
term
,desc
orcatDesc
exists, it is included in initialized taxonomy.