Closed matyaskopp closed 1 year ago
@TomazErjavec is it better this way? it does not break my script and you can use a new parameter `noAna="..." in your finalization script
Yes, this is definitelly much better, thanks. A few minor complaints:
But none of these is a deal breaker, I can also survive without these tweaks, just let me know.
I believe that your request is now implemented:
java -jar /usr/share/java/saxon.jar \
> outDir=Data/ParlaMint-GR/factorize-teiHeader \
> noAna="ParlaMint-taxonomy-parla.legislature.xml ParlaMint-taxonomy-speaker_types.xml ParlaMint-taxonomy-subcorpus.xml ParlaMint-listOrg.xml ParlaMint-listPerson.xml" \
> -xsl:Scripts/parlamint-factorize-teiHeader.xsl \
> Data/ParlaMint-GR/ParlaMint-GR.ana.xml
INFO: Starting to process ParlaMint-GR.ana
INFO: processing root
INFO: Saving taxonomy to Data/ParlaMint-GR/factorize-teiHeader/ParlaMint-taxonomy-parla.legislature.xml
INFO: replacing xml:id parla.legislature with ParlaMint-taxonomy-parla.legislature
INFO: Saving taxonomy to Data/ParlaMint-GR/factorize-teiHeader/ParlaMint-taxonomy-speaker_types.xml
INFO: replacing xml:id speaker_types with ParlaMint-taxonomy-speaker_types
INFO: Saving taxonomy to Data/ParlaMint-GR/factorize-teiHeader/ParlaMint-taxonomy-subcorpus.xml
INFO: replacing xml:id subcorpus with ParlaMint-taxonomy-subcorpus
INFO: Saving taxonomy to Data/ParlaMint-GR/factorize-teiHeader/ParlaMint-taxonomy-NER.ana.xml
INFO: replacing xml:id NER with ParlaMint-taxonomy-NER.ana
INFO: Saving taxonomy to Data/ParlaMint-GR/factorize-teiHeader/ParlaMint-taxonomy-UD-SYN.ana.xml
INFO: replacing xml:id UD-SYN with ParlaMint-taxonomy-UD-SYN.ana
INFO: Saving listOrg to Data/ParlaMint-GR/factorize-teiHeader/ParlaMint-GR-listOrg.xml
INFO: replacing xml:id with ParlaMint-GR-listOrg
INFO: Saving listPerson to Data/ParlaMint-GR/factorize-teiHeader/ParlaMint-GR-listPerson.xml
INFO: replacing xml:id with ParlaMint-GR-listPerson
If prefix
is not defined, then it is derived from /teiCorpus/@xml:id
Great, just what I wanted and seems to work just fine!
This is rather horrible, as it is contrary to how I do things otherwise, i.e. I first do .ana and then .TEI, as I need to insert the number of words in .ana into .TEI, and here I would have to do it the other way around, rather a mess...
Would it be possible for you to change the script so that the "skip" files are not actually skipped, but that you generate them as usual, except that you give them names as they are in the skip list? Or does that destroy some of your assumptions?
TODO
noAna
that contains a list of taxonomies/files where the ana interfix will not be included (because it was seen in TEI version)sample run:
and the output: