Orange-OpenSource / conllueditor

ConllEditor is a tool to edit dependency syntax trees in CoNLL-U format.
BSD 3-Clause "New" or "Revised" License
54 stars 17 forks source link

Morphology #4

Closed Stormur closed 3 years ago

Stormur commented 4 years ago

Hi!

I was wondering if and how it is possible to load a list of language-specific morphological features, so that they will be correctly validated during annotation (e.g. Number[psor]=Sing|Plur). Besides, is there a way to implement a standard roster of such features for a given part of speech, such that they will automatically appear in the morphological field of a token, and to ease their annotation displaying suggestions for the possible values, like for UPOS and deprels?

Thank you, and congratulations for this tool!

jheinecke commented 4 years ago

Good idea, I'll implement at least the validation for the features too! Thanks! I have to think how to do the display the suggestions too. For the time being you can configure the validator (I use Dan's official validate.py). Hitting the button validation will run the validator on the current sentence. Just create a valid.conf file containing one line

script: /path/to/UniversalDependencies/tools/validate.py --lang cy --max-err 0 --level 5 {FILE}

set the correct path to validate.py and set correct option for --lang and start the server with

conlluedit.sh -r --validator valid.conf ...

At least this will help you to spot invalid feature values until I have completed the implementation

Stormur commented 4 years ago

Yes, I was already using that option!

The fact is that the validator script, if correctly set, recognizes all language-specific relations and features, but the program does not during annotation.

In the same vein as above, I'd suggest to have the annotator interpret a blank feature like Feature=as if it were not there, so that it will be shown only if compiled, but it stays there as a possible reminder/shortcut.

jheinecke commented 4 years ago

I do not exactly what you mean the blank feature Feature= who is going to ignore it? The internal validation?

Stormur commented 4 years ago

I was proposing/envisioning something like the following.

Let's suppose that I want every VERB to have features Aspect, Tense, Person. So, I imagine to open the feature window of the node, and in the box "features" to find

Aspect= Tense= Person=

and possibly something like

Aspect= Tense=Pres Person=

if the CoNLL-U already contains some annotations. So I can rapidly annotate what's left or correct existing annotations. But then, it can also happen that I will leave some fields empty, for example Aspect=; in such cases, Aspect should not figure as a feature neither in the GUI nor in the final CoNLL-U, as it happens now!

I hope my idea is clear, and that it does not sound too impractical! :-)

jheinecke commented 4 years ago

I added an option --features which marks in red invalid features and suggests features in the word edit popup (92cd116708). I have no clear idea yet how to add these "templates" you'd like to have. Can a feature have an empty value in a conllu file (I did not found any, but invalid files exist and can be edited to correct). For the treebanks I'm editing I use a script which adds things not to forget (Like Aspect= for all VERBs and a validation script which tells me where values are missing. If the editor adds empty files before editing and deletes empty features afterwards, it cannot distinguish between those valueless features which have been (erroneously) in file in the first place.

Stormur commented 4 years ago

Another issue: the editor does not seem to treat well features with mutiple values, for example PronType=Int,Rel. As of now this notation is accepted by the UD validator, so it should be kept, but it happens that one gets PronType=Int and a new relation Rel is introduced.

jheinecke commented 4 years ago

You are right, I missed this evolution. I'll try to correct this quickly

jheinecke commented 4 years ago

Should be OK now (9c343f71)

Stormur commented 4 years ago

Wonderful! Thanks a lot!

I will make use of the editor quite intensively, so I will let you know if everything works fine! ;-)

Stormur commented 4 years ago

Hi!

I've noticed that the editor seems to order the morphological features in a case-sensitive way, e.g. it puts NumType before Number, even if I reorder them, so that the validator complains. So, feature ordering should be case-insensitive.

jheinecke commented 4 years ago

Done :-). checkout a0351bcc47

Stormur commented 4 years ago

Thanks!

Stormur commented 4 years ago

Sorry to bother again, but even after pulling the ordering stays sensitive to case: NumType comes before Number!

jheinecke commented 4 years ago

Strange though, the unitary tests order it correctly (since I had to change all reference files) Did you recompile the java stuff? It should say version 2.6.0.

Stormur commented 4 years ago

Right, my fault! It all works fine, thanks!

jheinecke commented 4 years ago

Great!