amrisi / amr-guidelines

246 stars 87 forks source link

NE concepts #55

Closed mgeorgescu closed 11 years ago

mgeorgescu commented 11 years ago

"If you have NEs like ""International Atomic Energy Agency"", do we always use an element from the name, in this case ""agency"", in order to annotate the concept?

OR ""Iraqi Interior Ministry operations director Major General Abdel Karim Khalaf said a judicial order was issued against Raghad Saddam Hussein a year ago by the central criminal court."" For "central criminal court", should this be annotated as "court" or "public-institution"?

There are also some cases, e.g.International Crisis Group, where the meaning goes beyond the category mentioned in the name - International Crisis Group = international organization --> in this case do we use ""group"" or ""organization"""?

The rule should be: 1) use a concept from the name if possible and only after go to the NE list? OR 2) always go the the NE list first and only if you cannot find a good fit there, use a concept present in the source sentence.

kevincrawfordknight commented 11 years ago

Named entities are sequences of mostly-capitalized words, like "International Atomic Energy Agency" or "International Crisis Group". Note that "Iraqi Interior Ministry" contains two entities that happen to be adjacent ("Iraqi" and "Interior Ministry").

For named entities, we select a type from the NE type list, such as "organization", "criminal-organization", etc. So, we don't use "agency", "group", "ministry" for the cases just mentioned.

Two exceptions:

1) If a named entity includes a title, like "Professor Wu", then we use the title as the type: (p / professor :name...). Titles "Mr." and "Mrs." are just treated as part of the name, however.

2) If a named entity is involved in an appositive ("Elsevier, the Dutch group"), then the type is taken from the appositive: (g / group :name...).

For phrases that are NOT named entities, such as "parking space" (s / space...) or "local district" (d / district...), we never use the NE type list. If "central criminal court" is not capitalized, then it is just (c / court ...).

We've been following these rules consistently for a while now, and they are hopefully in the guidelines.

Note that in texts with capitalization errors, it may not (unfortunately!) be trivial to determine whether something is a named entity or not.

uhermjakob commented 11 years ago

I added a new example page on this topic at which is accessible via a new link from