Princeton-CDH / ppa-nlp

Discovering patterns in poetry’s data with machine learning; software for use with Princeton Prosody Archive (PPA) full-text corpus
1 stars 0 forks source link

test exporting annotations from Prodigy #53

Closed mnaydan closed 2 months ago

mnaydan commented 2 months ago

This will help us decide what to do with OCR gobblydegook pages rendered unreadable -- do we ignore or reject?

laurejt commented 2 months ago

From the looks of it, selecting any of the accept, reject, and ignore answer buttons is logged. The exported data has an answer field for each annotation with the corresponding label of the button pressed.

Taking a look at some prodigy documentation, the reject and ignore buttons make less sense for our annotations. That said, we could use them to indicate different things about the underlying page/text images.

I want to stress the unfortunate behavior that ignored/skipped items are considered "completed", so once saved they cannot be returned to by the annotator (at least as far as I can tell).

mnaydan commented 2 months ago

Thank you @laurejt, that's helpful -- and counter to what we thought the "ignore" button did. Under what circumstances would you want an annotator to ignore/skip items, if at all? What would you prefer an annotator do for OCR gobblydegook pages - ignore or reject?

It also looks like the documentation recommends using the accept button for examples with no entities. Would we prefer to change the instructions to follow that guidance, or is it ok to continue pressing reject for negative examples?

laurejt commented 2 months ago

Thank you @laurejt, that's helpful -- and counter to what we thought the "ignore" button did. Under what circumstances would you want an annotator to ignore/skip items, if at all? What would you prefer an annotator do for OCR gobblydegook pages - ignore or reject?

Good question. My current thoughts are that if there's a unrecoverable issue with the image and/or text they should mark reject; there's something "bad" about the example that makes it impossible to annotate.

Personally, I'd just have them not use the "ignore" button.

We don't really need to use either the ignore or reject button in our setting, but it's not a part of the interface we can change. So, it's mostly a matter of what is useful for us since it's easy to separate rejected / ignored annotations.

laurejt commented 2 months ago

Also relevant, is the flag. On the exported data side of things, annotations have a flag field which is set to true or false. So, its seems like a useful binary indicator when we're testing things out. It doesn't impact the annotations themselves, but can let us know which instances annotators had trouble with.

mnaydan commented 2 months ago

That makes sense. How do you want negative (no poetry) examples handled?

laurejt commented 2 months ago

That makes sense. How do you want negative (no poetry) examples handled?

Just have them hit accept button. From the database side of things this will correspond to an instance without any text spans (and presumably something similar for images, I didn't test this part out yet due to the tigerdata issues). So, equivalently a negative example.

rlskoeser commented 2 months ago

FYI, these buttons can be configured in the prodigy config; default set is

  "buttons": ["accept", "reject", "ignore", "undo"],

So we could turn off the ignore button if we don't want it to be an option.

laurejt commented 2 months ago

FYI, these buttons can be configured in the prodigy config; default set is

  "buttons": ["accept", "reject", "ignore", "undo"],

So we could turn off the ignore button if we don't want it to be an option.

You're right! I must have been looking at an outdated support thread. It's reasonable to have caution for some alternative to accept, but I'm not convinced there's a good reason to have both reject and ignore.

mnaydan commented 2 months ago

@rlskoeser how hard would it be to turn it off? It sounds like getting rid of ignore would be a good idea given it's strange behavior with the save. Keeping reject seems important for the case Laure outlined above.

rlskoeser commented 2 months ago

It's pretty easy, we just need to decide where it should happen - it sounds like it's something that will always make sense for this recipe (= annotation task) rather than how we're using it, so my vote is to set it in the python recipe code. But we could also put it in the ansible configuration.

mnaydan commented 2 months ago

+1 for recipe -- that makes sense to me. I'll make a separate issue to track it.