Support tools for Latvian Treebank and Latvian morphologically annotated corpora. Native data format of the Latvian corpora are adjusted PML. Corresponding PML Schemas are available at TrEd extension/lv-treebank/resources.
For each script in PmlCorporaTools there is a .bat file showing invocation sample on dummy data located in PmlCorporaTools/testdata. Also, it is possible to launch these scripts without parameters to get information about expected parameter values and meaning.
Some scripts have not been used several years and might be obsolete, sorry.
Folder Docs containing datasplits moved to Treebank repo in 2017-12-11.
For PmlCorporaTools:
XML::Simple
For LVTB2UD
Files PmlCorporaTools/*_sample.bat
contains general descriptions and
commented-off Windows comand samples for main workflows. To follow through a
workflow on Windows machine, create a copy of the necassary sample.bat
and
update it accordign to your needs. For convenience of Windows users
.gitignore
blocks scripts named /PmlCorporaTools/* - Copy.bat
:wink:
To follow through a workflow on a Unix machine, you have to create similar
shell script, but it should be relatively easy as the all interesting data
processing is done in platform independent (hopefully) perl scripts.
PmlCorporaTools/prepareForLvtbPublication_sample.bat
- create dataset for publishing LVTB
in the native hybrid/PML format.PmlCorporaTools/convertLvtbToUd_sample.bat
- converting teebank to UD \&
conllu. Also, this contains notes on what checkups and preparation steps
should be done for an UD release.PmlCorporaTools/postprocessConlluForSembank_sample.bat
- create UD data
for FullStack project Sembank.PmlCorporaTools/checkNormalizeSembankIds_sample.bat
- ID verification
before including treebank files into SemBank - this is what is done in
Treebank moratorium.PmlCorporaTools/PmlCorporaTools/aTreeTransformator_sample.bat
- convert
treebank to old dependency formats used before UD.PmlCorporaTools/LVK2LVTB-PML.readme.md
PmlCorporaTools/prepareForLvtbInclusion_sample.bat
- to add completely
new, hand annotated file to LVTB.Unite
(if needed),
TreeTransformatorUI with step --ord TOKEN
, apply lvpml2tiger.xsl
CheckLvPml
with param M
(currently not used)../preparePOSTagData.sh
is used for preparing morfological data for
LVTagger(c) Institute of Mathematics and Computer Science, University of Latvia, 2010-2017
This software is licensed under GNU General Public License. Commercial licensing is available if necessary, contact us at lauma@ailab.lv.