amir-zeldes / gum

Repository for the Georgetown University Multilayer Corpus (GUM)
https://gucorpling.org/gum/
Other
89 stars 50 forks source link

Song title #144

Closed nschneid closed 1 year ago

nschneid commented 1 year ago

In bio_dvorak, "Songs My Mother Taught Me" has lemmas "Mother" and "Taught", and the latter should be treated as a verb rather than PROPN.

amir-zeldes commented 1 year ago

Mother seems correct. Will fix Taught > Teach

nschneid commented 1 year ago

Lowercase right? It's not a different sense of "mother"

amir-zeldes commented 1 year ago

All NNP lemmas retain uppercase, isn't that the rule? Even the lemma "Unite" for "United States", as per the guidelines we discussed a while back. It's part of a work of art title, so "Mother" is NNP -> lemma is Mother.

nschneid commented 1 year ago

Hmm...UniversalDependencies/UD_English-EWT/issues/131 says PROPNs currently retain capitalization. So maybe "Mother" is OK. Keeping the capitalization in "Teach" is definitely surprising if I want to explore the syntax of particular verbs. Art titles are particularly hard because they can contain entire sentences reinterpreted as names.

nschneid commented 1 year ago

(If I were making the rules from scratch, the criterion for PROPN would be something like, "If a noun appears in a proper name, it is PROPN if it can never be a common noun, or if the salience of the common noun sense in the name is absent/highly backgrounded relative to the word's conventional use for forming names." So nouns used compositionally that happen to be in a title or organization name would be NOUN. But "Republic" is associated with a pattern of forming country names, so that would be PROPN in "Dominican Republic" etc.)

amir-zeldes commented 1 year ago

Not sure if we could get high agreement for such a guideline... Ultimately I think PTB NNP is just a utilitarian thing as a stand in for NER at the POS level, but the rules for it are also quite convoluted.

In any case, for this sentence the right thing to do based on the guidelines is IMO "Mother" and "Teach", so I will go with that.