Closed matyaskopp closed 1 year ago
The corpus validates with original taxonomies, so the LV one is just a subset
Thanks, it is much better. Your data are almost ready to merge.
There is only one thing I have spotted:
The NER taxonomy file does not contain translations, and the English version has no attribute xml:lang="en"
https://github.com/Skriptotajs/ParlaMint/blob/763338f4f2419ec3d4f070e95700e7e2fd57aa27/Data/ParlaMint-LV/ParlaMint-taxonomy-NER.ana.xml
It should look like this: https://github.com/clarin-eric/ParlaMint/blob/af4155773fcd05f1b85ffa0443330dfdd36533f9/Data/ParlaMint-UA/ParlaMint-taxonomy-NER.ana.xml#L1-L18
Translated NER taxonomy
Thanks.
taxonomies
You have changed some taxonomies.
It will cause troubles in ParlaMint v3.1, where we want to merge all translations of taxonomies into one.
You can extract taxonomies from the root file with:
So you will do changes in one place (most of the taxonomies are shared between TEI and TEI.ana versions)
idno type
https://clarin-eric.github.io/ParlaMint/#TEI.idno https://github.com/Skriptotajs/ParlaMint/blob/d2895e5fc0926974293c14973c7d5285e4e17b6b/Data/ParlaMint-LV/ParlaMint-LV.xml#L293
should be
term in component file
Different term number in
title
andmeeting
https://github.com/Skriptotajs/ParlaMint/blob/d2895e5fc0926974293c14973c7d5285e4e17b6b/Data/ParlaMint-LV/ParlaMint-LV_2014-11-11-PT12-270.xml#L9-L12meeting
- sittingThe sitting is stored in one file, the title stores its date, but the
<meeting>
element which should somehow record the information from title doesn't, so I am suggesting adding: