Hyperparticle / udify

A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology tags, lemmas, and dependency trees.
https://arxiv.org/abs/1904.02099
MIT License
219 stars 56 forks source link

Prediction of multi-word expression #28

Open gifdog97 opened 2 years ago

gifdog97 commented 2 years ago

Is it possible to predict multi-word expression (MWE) from raw text? I run predict.py with option --raw_text to find that MWE cannot be predicted.

For example, in Italy, "della" is abbreviation of "di la" and UD annotates such token like as follows:

31-32   della   _   _   _   _   _   _   _   _
31  di  di  ADP E   _   35  case    35:case _
32  la  il  DET RD  Definite=Def|Gender=Fem|Number=Sing|PronType=Art    35  det 35:det  _

However, the output of UDify is something like this:

31  della   della   ADP _   _   3   case    _   _

I hope to obtain the conllu output with proper MWE. Are there any way to realize it?