dchaplinsky / LT2OpenCorpora

Python script to convert ukrainian morphological dictionary to OpenCorpora format. Script runs well under PyPy and also collects some stats/insights/anomalies in the dicts. Use on your own risk.
MIT License
12 stars 10 forks source link

LT2OpenCorpora

Python script to convert the Ukrainian morphological dictionary from the LanguageTool project to the OpenCorpora format. The script runs well under PyPy and also collects some stats/insights/anomalies in the input dictionary. Use at your own risk.

It solves these tasks:

It's all about grouping

Grouping wordforms under a particular lemma is cumbersome for various reasons. Mostly because of homonymy and the internal format of the LanguageTool dict. In a nutshell:

Prerequisites

pip install -r requirements.txt

Batteries included

Visualised mapping between the tagsets in a great detail

Mapping

Running

python bin/lt_convert.py 1000.txt out.xml --debug