Closed matyaskopp closed 1 year ago
@TomazErjavec
I have added the target test-factorize
to the makefile:
https://github.com/clarin-eric/ParlaMint/blob/1424c3e21f1da972587fae3bfdb74499ba14ab39/Distro/Makefile#L51-L62
running
make test-factorize CORPUS=BA
make test-factorize CORPUS=LV
you get this result:
du -h Test/Factorized/*/*
12K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-BA.ana.xml
16K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-BA-listOrg.ana.xml
288K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-BA-listPerson.ana.xml
4,0K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-taxonomy-NER.ana.xml
12K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-taxonomy-parla.legislature.xml
4,0K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-taxonomy-speaker_types.xml
4,0K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-taxonomy-subcorpus.xml
8,0K Test/Factorized/ParlaMint-BA.TEI.ana/ParlaMint-taxonomy-UD-SYN.ana.xml
16K Test/Factorized/ParlaMint-BA.TEI/ParlaMint-BA-listOrg.xml
288K Test/Factorized/ParlaMint-BA.TEI/ParlaMint-BA-listPerson.xml
12K Test/Factorized/ParlaMint-BA.TEI/ParlaMint-BA.xml
12K Test/Factorized/ParlaMint-BA.TEI/ParlaMint-taxonomy-parla.legislature.xml
4,0K Test/Factorized/ParlaMint-BA.TEI/ParlaMint-taxonomy-speaker_types.xml
4,0K Test/Factorized/ParlaMint-BA.TEI/ParlaMint-taxonomy-subcorpus.xml
8,0K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-LV.ana.xml
8,0K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-LV-listOrg.xml
144K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-LV-listPerson.xml
4,0K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-taxonomy-NER.ana.xml
8,0K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-taxonomy-parla.legislature.xml
4,0K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-taxonomy-speaker_types.xml
4,0K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-taxonomy-subcorpus.xml
72K Test/Factorized/ParlaMint-LV.TEI.ana/ParlaMint-taxonomy-UD-SYN.ana.xml
8,0K Test/Factorized/ParlaMint-LV.TEI/ParlaMint-LV-listOrg.xml
144K Test/Factorized/ParlaMint-LV.TEI/ParlaMint-LV-listPerson.xml
8,0K Test/Factorized/ParlaMint-LV.TEI/ParlaMint-LV.xml
8,0K Test/Factorized/ParlaMint-LV.TEI/ParlaMint-taxonomy-parla.legislature.xml
4,0K Test/Factorized/ParlaMint-LV.TEI/ParlaMint-taxonomy-speaker_types.xml
4,0K Test/Factorized/ParlaMint-LV.TEI/ParlaMint-taxonomy-subcorpus.xml
The component files are not copies. I am not sure how to handle this - I don't want to copy all files if only root files are changed. The distro script expects everything in one folder....
Thank you @matyaskopp. On this basis I now made in 794d629 parlamint-factorize-corpora.pl that factorised all the submitted corpora, which can then serve as input to the distribution script. So, all done here, closing.
Prepare script for factorization data before finalization with distro script