amrisi / amr-guidelines

246 stars 87 forks source link

Location names within names of entities #32

Closed cbonial closed 11 years ago

cbonial commented 11 years ago

When the location of a named entity is included in the name itself, do we treat this as a location, or as part of the name?

For example: Ecological organization Greenpeace stated on June 21, 2002 that Russia helped Iran to develop nuclear weapons by building Iran's Bushehr atomic power station.

Bushehr is the city in which the power station exists, and the rest isn't capitalized, so I initially assumed this should have an AMR where Bushehr is the :location of the power station.

However, subsequent mentions use "Bushehr" to stand in for the power plant itself:

Iran has said the plant is being built only for civilian energy purposes and allows regular inspections of Bushehr by the International Atomic Energy Agency.

Experts say Bushehr could in theory become operational as early as September 2003.

Should Bushehr therefore be treated as the name of the power station, and therefore be left semantically unanalyzed as the location? This also seems somewhat at odds with our guideline analysis of "United States Congress" where U.S. is a :mod of congress.

kevincrawfordknight commented 11 years ago

The ideal solution is to start normalizing proper names (wikification). That is, use "" (or short form) inside AMR. I think many kinds of users would love this.

Meanwhile, here, I would use (f / facility :name (n / name :op1 "Bushehr")). I imagine this comes up a lot (e.g., "Northridge earthquake") so we need a rule. I'd use ":name" if the proper name seems like a designator, rather than a word that simply provides location information ("Mexico City riots").

uhermjakob commented 11 years ago

I think this is a case of metonymy, as the two Bushehrs really denote two different entities:

Much like Hollywood often alternatively refers to a city-district in LA or the American movie industry. And I think it's actually cool that the AMRs can thus reflect that the two mentions of Bushehr refer to different entities.

In AMR 2.0, we can just refer to the later Bushehr (the facility) by the variable of the first "Bushehr atomic power station" as a whole.

cbonial commented 11 years ago

Yes, I did notice that it seems to be used as a proper name in wiki, but I also agree with Ulf's analysis as a metonymical reference. What I have currently done, so what seemed most intuitive to me, was what Ulf suggested, so I used :location for the full reference, and then in subsequent metonymical references, I used facility :name Bushehr. I think this is most specific and informative, but I'm not sure if it's desirable to have a consistent reference, as it seems Kevin may be suggesting.

uhermjakob commented 11 years ago

Bushehr is the city in which the power station exists, and the rest isn't capitalized, so I initially assumed this should have an AMR where Bushehr is the :location of the power station.

Just a quick reminder to all annotators that LDC sent us the proxy sentences ALL CAPITALIZED, and that I converted them to normal case automatically, followed by a bit of manual clean-up with a focus on cases relevant to sentence splitting, primarily because other automatic preprocessing programs such as sentence splitting and NE-tagging depend on normal capitalization. So we have to careful not to pay too much attention to capitalization.

The workset sentence that Claire cited (automatically converted to true case):

Ecological organization Greenpeace stated on June 21, 2002 that Russia helped Iran to develop nuclear weapons by building Iran's Bushehr atomic power station.

What we had actually received from LDC:


The Agence France-Presse newswire text that LDC presumably based their proxy sentence on:

Ecological organisation Greenpeace accused Russia on Friday of helping Iran to develop nuclear weapons by building Iran's Bushehr atomic power station.

So yes, Claire's intuition matches the AFP original, but again, let's be careful and not pay too much attention to the automatically generated capitalization of the proxy sentences.