amrisi / amr-guidelines

240 stars 86 forks source link

Named entities with multiple names in Wikipedia articles #223

Open uhermjakob opened 6 years ago

uhermjakob commented 6 years ago

I have shared with Bianca a workset with 192 lead sentences from 100 Wikipedia articles about named entities as part of our AMR diversity annotation effort.

Some entities have multiple names, in the following extreme example 5 names in 4 languages in 3 scripts with 2 transliterations and 1 pronunciation. Below is my AMR annotation proposal. Am I going overboard with names? Any feedback welcome so we have an established consensus reference before Wikipedia NE article annotation starts on a large scale.

Ochamchire or Ochamchira (Georgian: ოჩამჩირე, [ɔtʃʰɑmtʃʰire]; Abkhaz: Очамчыра, Ochamchyra; Russian: Очамчира, Ochamchira) is a seaside city on the Black Sea coast of Abkhazia, Georgia. and a centre of the eponymous district.

(a / and
  :op1 (c / city
         :location (c2 / coast
                     :mod (s2 / sea :wiki "Black_Sea"
                            :name (n6 / name :op1 "Black" :op2 "Sea"))
                     :mod (c3 / country-region :wiki "Abkhazia"
                            :name (n10 / name :op1 "Abkhazia")
                            :location (c4 / country :wiki "Georgia (country)"
                                        :name (n11 / name :op1 "Georgia"))))
         :domain (c6 / city :wiki "Ochamchire"
                   :name (n2 / name :op1 "Ochamchire")
                   :name (n3 / name :op1 "Ochamchira")
                   :name (n4 / name :op1 "ოჩამჩირე"
                           :medium (l / language :wiki "Georgian language"
                                     :name (n5 / name :op1 "Georgian"))
                           :ARG1-of (p / pronounce-01
                                      :ARG2 (s / string-entity :value "ɔtʃʰɑmtʃʰire"
                                              :medium (w / writing-script :wiki "International Phonetic Alphabet"
                                                        :name (n14 / name :op1 "IPA")))))
                   :name (n7 / name :op1 "Очамчыра"
                           :medium (l3 / language :wiki "Abkhaz language"
                                     :name (n12 / name :op1 "Abkhaz"))
                           :ARG2-of (t / transliterate-01
                                      :ARG3 (n13 / name :op1 "Ochamchyra")))
                   :name (n8 / name :op1 "Очамчира"
                           :medium (l2 / language :wiki "Russian language"
                                     :name (n9 / name :op1 "Russian"))
                           :ARG2-of (t2 / transliterate-01
                                      :ARG3 n3))))
  :op2 (h / have-org-role-91
         :ARG0 c6
         :ARG1 (d / district :wiki "Ochamchire Municipality"
                 :name n2)
         :ARG2 (c5 / center)))
uhermjakob commented 6 years ago

After discussion at the AMR phone meeting on Oct. 9, 2017, and some more thought and discussion I had with Kevin, let's not annotate alternative names (including in different languages and scripts), pronunciations and transliterations. This will be in analogy of not annotating abbreviations (when they accompany the expanded form) and not annotating alternative quantities (e.g. 50 miles in addition to 80 kilometers).

Alternative names, pronunciations and transliterations can be of value for some NLP applications, but this information can also be extracted from Wikipedia by a special dedicated program.

Updated AMR Dict: https://www.isi.edu/~ulf/amr/lib/amr-dict.html#alternate%20form Below is the target annotation.

Ochamchire or Ochamchira (Georgian: ოჩამჩირე, [ɔtʃʰɑmtʃʰire]; Abkhaz: Очамчыра, Ochamchyra; Russian: Очамчира, Ochamchira) is a seaside city on the Black Sea coast of Abkhazia, Georgia. and a centre of the eponymous district.

(a / and
  :op1 (c / city
         :location (c2 / coast
                     :mod (s2 / sea :wiki "Black_Sea"
                            :name (n2 / name :op1 "Black" :op2 "Sea"))
                     :mod (c3 / country-region :wiki "Abkhazia"
                            :name (n3 / name :op1 "Abkhazia")
                            :location (c4 / country :wiki "Georgia_(country)"
                                        :name (n4 / name :op1 "Georgia"))))
         :domain (c6 / city :wiki "Ochamchire"
                   :name (n / name :op1 "Ochamchire")))
  :op2 (h / have-org-role-91
         :ARG0 c6
         :ARG1 (d / district :wiki "Ochamchire_Municipality"
                 :name n)
         :ARG2 (c5 / center)))
nschneid commented 6 years ago

I don't have a strong opinion either way, but a potential downside of omitting alternate names from the AMR is that the text fragment giving the alternate names doesn't align at all to the AMR, and thus parsers have to learn a special policy for contexts in which names are ignored.

An alternative might be to have some sort of "also known as" frame or role to handle such expressions. I suppose for consistency this should include equivalent measurements as well.

These alternate expressions are there for a reason, even if they are denotationally equivalent. I guess it's a question of whether AMR should include all metalinguistic information that needs to be explicitly communicated to the reader, or just enough to have one way to identify the referent.