GateNLP / gate-teamware

A web application for collaborative document annotation.
https://gatenlp.github.io/gate-teamware/
GNU Affero General Public License v3.0
4 stars 0 forks source link

Annotator training mode #163

Closed twinkarma closed 2 years ago

twinkarma commented 2 years ago

Overview

Allows the uploading of additional pre-annotated datasets for training and testing annotators before they can annotate for real. This is a common process used when recruiting annotators, for example https://crowd.cochrane.org/.

Training mode

Testing mode

Labelling answers

Tasks

Labelling documents with answers (gold standard)

For test and training documents, answers and explanations should be included in the JSON/csv they upload following the format below:

{
  "id": 1
  "text": "Document 1"
  "gold": 
    {
     "[labelName]": { "value":str/array, "explanation": str}
      ,...
    }
}

So in csv the columns will be

id | text | gold.[labelName].value | gold.[labelName].explanation

* Replace [labelName] with the name of the actual label specified in project configuration

twinkarma commented 2 years ago

Is there only ever one right answer?

davidwilby commented 2 years ago

@twinkarma, a few questions already:

  1. In the above we have
    • Add test_documents to Project model
    • Add train_document to Project model

Hadn't we planned to have separate tables for training and test documents? We may have decided against this and I've forgotten.. So far, what I've done (in #169 ) is to create classes for training and test documents which inherit from a base document class. (Note that to do this, I've moved all the properties and methods from Document() to BaseDocument() then recreate the former as Document(BaseDocument) - since it turns out that you can't overwrite properties in child classes (e.g. the project field)

  1. I'm not sure what your intention is with the annotator_max_train_score and annotator_max_test_score fields?

  2. For num_annotations - would it be better to compute this on the fly? Or did we decide that this would be too slow and to update this property with each annotation?

twinkarma commented 2 years ago
  1. Can't remember what we decided before it but I'd had a re-think about it and I'm now not sure if there is the need to increase the complexity of the app as now we'd also potentially need 3 Annotation classes for each Document class?
  2. Now that I think about it it's just basically a count of the documents in the training set and the testing set, so those functions will just be return this.test_documents.all().count() and same for training score
  3. I think we can just calculate this on the fly for now and change it if we are getting slowdowns. It's not something that will get called that often, only to check if a user's completed a certain stage of annotation I guess.
twinkarma commented 2 years ago

For the annotator_max_train_score and annotator_max_test_score I've just changed the field name to num_training_documents and num_test_documents, probably makes more sense this way.

davidwilby commented 2 years ago

Are we good to close this issue?