UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
271 stars 245 forks source link

Suitable interface for labeling morphological features? #1049

Closed AngledLuffa closed 1 month ago

AngledLuffa commented 2 months ago

I have been wondering, what is a good interface for labeling morphological features? For example, something that keeps tracks of constraints such as Aspect only applies to VERB and either simply doesn't allow for putting that feature on a VERB, or at least gives the annotator a friendly reminder when they try to do that. We used Datasaur for POS and dependencies, and that worked very well (although it would still have been made better if it had some constraint checking on the dependency graphs), but they reported not being able to do this kind of constraint checking on morphological features.

Stormur commented 2 months ago

I allow myself to link @jheinecke , as the developer of conllueditor, to discuss if it is possible to introduce this as a feature (I suggested it some time ago, by the way).

At the moment conllueditor suggests all possible options for a given feature and warns of non-allowed ones, but it would be nice to be possible to specify some "roster" of features tied to parts of speech.

dan-zeman commented 2 months ago

I allow myself to link @jheinecke , as the developer of conllueditor...

BTW Conllueditor allows you to configure a validation script and have it run on each sentence when you are done with its annotation. It does not have to be (only) the official UD validator. So it should be possible to create your own language-specific script that will report unexpected feature combinations, missing features (e.g. you may want to say that every NOUN must have a non-empty Gender) etc.

jheinecke commented 2 months ago

Thanks Dan, yes, that's exact! There is the --features option which reads the data/feats.json file (https://github.com/UniversalDependencies/tools.git) and uses this information to propose valid feat=value pairs. Currently undefined features appear in red in the editor (same with UPOS and deprels). However, Features used for a UPOS which does not have this feature (like Person=1 on a NOUN) are not (yet) marked as an error. I think this would be easy to implement though.

AngledLuffa commented 2 months ago

Thanks for the suggestion. It looks like a very compelling feature set. So would the best configuration be, we ask the annotators to run it themselves, or we host a server and have them do work on our server? If each runs it themselves, we'd need to split up the work for them, right? It'd also need to be pretty simple for non-technically-savvy users. Is there support for handling race conditions if we run it locally and have them connect to our server?

jheinecke commented 2 months ago

conllueditor is not (yet) multiuser. So if two users use the same server instance everything is fine until they try to edit the same sentence at the same time, the one who saves last, wins. But you can run several instances (using different ports) on your servers and give each annotator a specific URL. So they do not have install it (rather easy on Linux (git clone, .zip file, or docker, but more difficult, probably, on Windows (git clone or .zip file)). I'll have to think about a way to avoid race conditions though

AngledLuffa commented 2 months ago

I'll have to think about a way to avoid race conditions though

ajax or other javascript updates in the frontend? honestly i don't know what people use these days, but i know that ajax was a thing a few years ago

jheinecke commented 2 months ago

I use ajax in the frontend, but it's the backend who stores the conllu file and I think it must be managed there. I'll have a look. I'll push the modification which marks invalid or features not allowed for a UPOS today or tomorrow

AngledLuffa commented 2 months ago

Thank you! Looking forward to it.

I think that given there are only two annotators on this project, plus occasional edits from the principals, it'll be easier & better to do some manual "locking" with the system that has proper constraints as opposed to a system that has the concurrency already built in but doesn't actually support editing the features. We'll keep an open mind.

jheinecke commented 2 months ago

I pushed it this evening ! (version 2.25.6)

AngledLuffa commented 1 month ago

Thank you!

I will take this conversation offline / to your github at this point. This looks like a very promising tool for our use case.

khansadaoudi commented 1 month ago

Hi everyone,

I wanted to take advantage of this discussion to announe that we have integrated UD validator script in our collaborative tool ArboratorGrew. The script will be run automatically when users save their trees, so that they can get an instant feedback while annotation. It can also be run on all the trees of the conll file. I highly encourage you to test this feature and feel free to send us your feedbacks and suggestion in our github issue page