Orange-OpenSource / conllueditor

ConllEditor is a tool to edit dependency syntax trees in CoNLL-U format.
BSD 3-Clause "New" or "Revised" License
54 stars 17 forks source link

Arabic - Multiple questions #14

Closed Kentoseth closed 2 years ago

Kentoseth commented 2 years ago

Hello/Bonjour.

I am experimenting with https://github.com/inception-project/inception/ at the moment, but there are some drawbacks/quirks due to the nature of the complex morphology and grammar in Arabic. I have a few questions:

Thanks.

jheinecke commented 2 years ago

Hi, ConlluEditor is primarily made for editing CoNLL-U files of the Universal Dependencies (UD) project. Multiple syntax layers are not possible in the CoNLL-U format. However you can use a different POS tagset than the one provided by UD. But a POS as assigned to a token, not to a character.

So for your example bismi you have to create a MultiTokenWord (ConlluEditor can do this) bi + smi in order to assign differnet POS to the preposition biand the noun smi. So your first example in CoNLL-U would look like

1-2 bi'smi  _   _   _   _   _   _   _   _
1   bi  bi  ADP _   _   2   case    _   _   
2   'smi    'sm NOUN    _   Gender=Masc 0   root    _   _
3   l-lahi  l-lah   PROPN   _   Case=Gen    2   nmod    _   _
4   l-raḥmāni   l-raḥmān    ADJ _   _   3   amod    _   _
5   l-raḥīmi    l-raḥīm ADJ _   _   3   amod    _   _

(Have alook in the Arabic Treebank for better examples how Arabic can be dealt with within UD)