OpenCorpora / opencorpora

A web-based engine for creating and annotating textual corpora
http://opencorpora.org
GNU General Public License v2.0
241 stars 23 forks source link

Added python script for parsing xml -> tsv or json. #898

Open alvadia opened 3 years ago

alvadia commented 3 years ago

Added python script for parsing xml -> tsv or json. First argument: from (example - dict.xml) Second argument: to (example - dict.json) Third argument: mode (example - json).

Sample format of tsv (' ' means space): id \t root \t data \t extra \n

header \t dictionary \t version \t revision \n

OpenCorpora \t dictionary \t \t \n

lemmas lemma variants empty

[ \t ' ' <';'.join(attributes)> \t ' ' <';'.join(attributes)> [, ' ' <';'.join(attributes)>] \t \n]

gramemes \t parent \t alias \t description \n

[ \t \t \t \n]*

links \t from \t to \t type \n

[ \t \t \t \n]*

It requires much less space. This script is a sample, it requires a .sh wrapper.

alvadia commented 3 years ago

5