Closed TomazErjavec closed 1 year ago
The parlamint-merge-taxonomy.xsl script is ready, here is how it is currently run: https://github.com/clarin-eric/ParlaMint/blob/2cdddce10e563167f329b8275253625a2860b86f/Corpora/Makefile#L1-L20
We now need to decide what to do with corpus specific categories:
ParlaMint-taxonomy-parla.legislature:
ParlaMint-taxonomy-speaker_types:
ERROR: ParlaMint-IS contains non-standard category parla.sittinig for taxonomy ParlaMint-taxonomy-parla.legislature ERROR: ParlaMint-IT contains non-standard category parla.meetining.public for taxonomy ParlaMint-taxonomy-parla.legislature
sittinig
and meetining
are typos, and parla.sittinig
and parla.meetining.public
are in fact not used in IS and IT at all. So, it is enough to correct the source taxonomy, and re-run the taxonomy-merge script and the problem will go away. For the next round IS, IT (and everybody else) should take the merged/split taxonomies as their input anyway.
ERROR: ParlaMint-IS contains non-standard category parla.unif for taxonomy ParlaMint-taxonomy-parla.legislature
This one is defined, on the same level as upper and lower house, as:
<category xml:id="parla.unif">
<catDesc xml:lang="is"><term>Sameinað þing</term></catDesc>
<catDesc xml:lang="en"><term>Unified Chamber</term></catDesc>
</category>
However, it is never used in the IS corpus, so I suggest we simply delete it, @starkadur, is this ok with you?
This is now operational, closing.
We want to have common taxonomies (in Corpora/Taxonomies) with all the translations included. The particular corpora should then get their taxonomies from the common ones. The planned workflow is: