Hi all!
I wanted to bring up the topic of adjective annotation, as unification day approaches. Sorry for getting this out barely before the call.
I did some scouting of the AMR data for re-annotation. The good news is that we have pretty good coverage over adjectives! That's the bad news too -- annotating everything we have a frame for will take a lot of work.
By my amateurish count, of all the concepts with :mod relations in the last LDC release, we have:
1562 adjective instances where the adjective alias has single roleset and a single role
4527 adjective instances with a single roleset, but multiple possible arguments (i.e. :mod might be polysemous)
3475 that have multiple rolesets, and may have to be disambiguated.
2962 unframed adjective instances (my wildly speculative count from throwing concept nodes into a pos tagger and counting the JJs...)
The minor but more pressing questions: what do we do on unification day?
I'm assuming that we'll map the -41 predicates to their new unification frames.
It might make sense during U-day to check the adjectives that have a clear "verbalization" but weren't consistently done ("famous" going to "fame" is now the official unification, but wasn't consistently verbalized).
We had talked about updating the rest of the adjectives being lower urgency. Are we one the same page with that?
More importantly, what are we doing in the future?
My own preference is that I'd want to have sense-disambiguated adjectives everywhere, with numbered arguments. There's just too much semantics there to not have them!
It's been suggested alternatively that we not frame up monosemous adjectives having only a single sense (they would not be underlined in blue, not be an option in the editor, and you would mark them as mod).
If 80% of the adjectives fit into that class, I'd be very tempted by the argument. But if my counts are even close to correct, doing so wouldn't really "solve" what it seems intended, in spirit, to solve -- avoiding adding lots of additional work in annotating adjectives.
So I wanted to open the floor to opinions, schemes, etc. I've got suggestion below, but this is mostly to re-start the conversation on this.
Tim's weird proposal:
I'm really really sympathetic to the idea of not doing additional work on adjectives. My main thing is that if an annotator sees an adjective, I'd like it to be clear what their next step is, without having to guess about its lexical properties. That's the appeal of our current, sometimes shallow, ":mod" treatment -- it's a single step.
The additional thing to note with the adjectives is that many have a single argument that would modify the "head" of the adjective (often arg1/proto-patient) -- "new.01", for example, has "arg1" for who/what is new, and "arg2 new to what?". If one was able to just say "grab the default, proto-patient role if there is one", that gets behind many of the same assumptions as "mod", right?
I imagine us having a shortcut function -- let's say ":ppt" -- that does the obvious when that's deterministic and alerts the annotator if there isn't. Specifically:
":ppt reasonable" could map to :arg1-of reasonable-02 (reasonable has 1 roleset and 1 role)
":ppt dark" is polysemous, and ideally the editor would just try to open the frame-selection list.
":ppt new" (having a clear "proto-patient" role and nothing like arg0 that you could be confusing it with) perhaps could automatically become "arg1-of new.01"
":ppt fluent" might become :ppt fluent-01, and the annotator would have to sort out the rest, since ":mod fluent" has an arg0 and an arg1 (think "fluent English" vs "fluent speakers")
I don't know if that's linguistically feasible or tractable for the editor, but wanted to throw it out there.
Additional arguments against leaving 1-roleset, 1-role adjectives these as "mod":
It would leave us treating very similar sentences with different roles (“that's an odd sentence” and “that's a strange sentence" have different relations).
I'm not sure what we would do when we decide that they were polysemous after all -- go back and reannotate each adjective, as we discover their polysemy or additional roles? This might structurally discourage us from updating the lexicon, which would be bad.
We won't necessarily know if things are actually acting polysemously: As Propbank is fresh to adjectival annotation, and hasn't annotated adnominal adjectives (but AMR will), AMR will invariably run into senses that Propbank didn't. Without rolesets, an annotator could regularly see 2-3 senses of something and never mark anything as -00.
I don't know much about AMR parsing, but have this fear of features like “number of rolesets at estimated time of annotation” being relevant/useful
Hi all! I wanted to bring up the topic of adjective annotation, as unification day approaches. Sorry for getting this out barely before the call.
I did some scouting of the AMR data for re-annotation. The good news is that we have pretty good coverage over adjectives! That's the bad news too -- annotating everything we have a frame for will take a lot of work.
By my amateurish count, of all the concepts with :mod relations in the last LDC release, we have:
The minor but more pressing questions: what do we do on unification day?
More importantly, what are we doing in the future?
Tim's weird proposal:
I'm really really sympathetic to the idea of not doing additional work on adjectives. My main thing is that if an annotator sees an adjective, I'd like it to be clear what their next step is, without having to guess about its lexical properties. That's the appeal of our current, sometimes shallow, ":mod" treatment -- it's a single step.
The additional thing to note with the adjectives is that many have a single argument that would modify the "head" of the adjective (often arg1/proto-patient) -- "new.01", for example, has "arg1" for who/what is new, and "arg2 new to what?". If one was able to just say "grab the default, proto-patient role if there is one", that gets behind many of the same assumptions as "mod", right?
I imagine us having a shortcut function -- let's say ":ppt" -- that does the obvious when that's deterministic and alerts the annotator if there isn't. Specifically:
I don't know if that's linguistically feasible or tractable for the editor, but wanted to throw it out there.
Additional arguments against leaving 1-roleset, 1-role adjectives these as "mod":