Closed AngledLuffa closed 1 month ago
I allow myself to link @jheinecke , as the developer of conllueditor, to discuss if it is possible to introduce this as a feature (I suggested it some time ago, by the way).
At the moment conllueditor suggests all possible options for a given feature and warns of non-allowed ones, but it would be nice to be possible to specify some "roster" of features tied to parts of speech.
I allow myself to link @jheinecke , as the developer of conllueditor...
BTW Conllueditor allows you to configure a validation script and have it run on each sentence when you are done with its annotation. It does not have to be (only) the official UD validator. So it should be possible to create your own language-specific script that will report unexpected feature combinations, missing features (e.g. you may want to say that every NOUN must have a non-empty Gender) etc.
Thanks Dan, yes, that's exact!
There is the --features
option which reads the data/feats.json
file (https://github.com/UniversalDependencies/tools.git) and uses this information to propose valid feat=value pairs. Currently undefined features appear in red in the editor (same with UPOS and deprels). However, Features used for a UPOS which does not have this feature (like Person=1
on a NOUN
) are not (yet) marked as an error. I think this would be easy to implement though.
Thanks for the suggestion. It looks like a very compelling feature set. So would the best configuration be, we ask the annotators to run it themselves, or we host a server and have them do work on our server? If each runs it themselves, we'd need to split up the work for them, right? It'd also need to be pretty simple for non-technically-savvy users. Is there support for handling race conditions if we run it locally and have them connect to our server?
conllueditor is not (yet) multiuser. So if two users use the same server instance everything is fine until they try to edit the same sentence at the same time, the one who saves last, wins. But you can run several instances (using different ports) on your servers and give each annotator a specific URL. So they do not have install it (rather easy on Linux (git clone, .zip file, or docker, but more difficult, probably, on Windows (git clone or .zip file)). I'll have to think about a way to avoid race conditions though
I'll have to think about a way to avoid race conditions though
ajax or other javascript updates in the frontend? honestly i don't know what people use these days, but i know that ajax was a thing a few years ago
I use ajax in the frontend, but it's the backend who stores the conllu file and I think it must be managed there. I'll have a look. I'll push the modification which marks invalid or features not allowed for a UPOS today or tomorrow
Thank you! Looking forward to it.
I think that given there are only two annotators on this project, plus occasional edits from the principals, it'll be easier & better to do some manual "locking" with the system that has proper constraints as opposed to a system that has the concurrency already built in but doesn't actually support editing the features. We'll keep an open mind.
I pushed it this evening ! (version 2.25.6)
Thank you!
I will take this conversation offline / to your github at this point. This looks like a very promising tool for our use case.
Hi everyone,
I wanted to take advantage of this discussion to announe that we have integrated UD validator script in our collaborative tool ArboratorGrew. The script will be run automatically when users save their trees, so that they can get an instant feedback while annotation. It can also be run on all the trees of the conll file. I highly encourage you to test this feature and feel free to send us your feedbacks and suggestion in our github issue page
I have been wondering, what is a good interface for labeling morphological features? For example, something that keeps tracks of constraints such as
Aspect
only applies toVERB
and either simply doesn't allow for putting that feature on aVERB
, or at least gives the annotator a friendly reminder when they try to do that. We used Datasaur for POS and dependencies, and that worked very well (although it would still have been made better if it had some constraint checking on the dependency graphs), but they reported not being able to do this kind of constraint checking on morphological features.