YoungXiyuan / DCA

This repository contains code used in the EMNLP 2019 paper "Learning Dynamic Context Augmentation for Global Entity Linking".
https://arxiv.org/abs/1909.02117
46 stars 15 forks source link

Why are existing candidates dropped in `find_coref`? #13

Open peblair opened 3 years ago

peblair commented 3 years ago

Hello,

I am trying to understand the with_coref and find_coref functions in the dataset loader. Roughly speaking, it appears that the goal of find_coref is to do the following (in pseudo-code):

find_coref(cur_m) :=
for each mention m in the same document as cur_m:
  if m's mention text starts or ends with the same text as cur_m BUT not equal to cur_m:
    add all of m's candidates to the result list (removing duplicates)
return the collected candidates

The results of find_coref are then used to overwrite cur_m's candidate list. This is a bit confusing to me, though, since the BUT ... above means that the candidates which were previously inside of cur_m's candidate list are lost (or at least potentially lost). Is this intentional? If so, can you explain what with_coref is intended to accomplish?

For example, on a local modification of this repository, I found that the gold entity (Teresa) is dropped from the list of candidates (I've verified in the AIDA train CSV [line 2426] that this is indeed the correct gold entity for this mention):

RuntimeError: Failed to find gold_key 'Teresa' in list: [(0, ('Mother_Teresa', 1.0)), (1, ('Mother_Teresa_High_School', 0.001)), (2, ('The_Missionary_Position', 0.001)), (3, ('Blessed_Mother_Teresa_Catholic_Secondary_School', 0.0))]
orig list: [['Teresa', 0.364], ['Teresa_(Barbie)', 0.138], ['Teresa,_Rizal', 0.115], ['Teresa_Nielsen_Hayden', 0.103], ['Teresa_of_Ávila', 0.092], ['Teresa_Heinz', 0.038], ['Teresa,_Castellón', 0.031], ['Teresa,_Greater_Poland_Voivodeship', 0.029], ['Mother_Teresa', 0.026], ['Teresa_Scanlan', 0.021], ['Teresa_Teng', 0.018], ['Theresa,_Countess_of_Portugal', 0.018], ['George_McGovern', 0.015], ['Teresa_Crippen', 0.013], ['Teresa_Palmer', 0.012], ['Teresa_Cristina_of_the_Two_Sicilies', 0.01], ['Teresa_Earnhardt', 0.01], ['Teresa_Wynn_Roseborough', 0.009], ['Teresa_(2010_telenovela)', 0.009], ['The_Real_Housewives_of_New_Jersey', 0.008], ['Teresa_(film)', 0.008], ['Teresa_Jungman', 0.008], ['Teresa_Bagioli_Sickles', 0.007], ['Teresa_Fernández_de_Traba', 0.007], ['Teresa_Bryant', 0.007], ['Teresa,_Contessa_Guiccioli', 0.007], ['Teresa_Strasser', 0.006], ['Teresa_Vaill', 0.006], ['Teresa_Mak', 0.006], ['Teresa_Murphy', 0.006], ['Teresa_Cheung_(actress)', 0.006], ['Teresa_Rivera', 0.006], ['Teresa_Nzola_Meso_Ba', 0.006], ['Tracy_Bond', 0.006], ['Teresa_Medina', 0.006], ['Infanta_Maria_Teresa_of_Spain', 0.006], ['Teresa_Seiblitz', 0.006], ['Teresa_Forcier', 0.006], ['Teresa_Taylor', 0.006], ['Teresa_Motos', 0.006], ['Teresa_Piotrowska', 0.006], ['Teresa_Ferster_Glazier', 0.006], ['Teresa_Fedor', 0.006], ['Teresa_Ganzel', 0.006], ['Teresa_Portela_(Portuguese_canoeist)', 0.006], ['Teresa_de_la_Parra', 0.006], ['Teresa_Piccini', 0.006], ['Teresa_Borawska', 0.006], ['Princess_Maria_Teresa_of_Savoy', 0.006], ['Teresa_Roncon', 0.006], ['Teresa_Wentzler', 0.006], ['Teresa_Machado', 0.006], ['Teresa_Magbanua', 0.006], ['Teresa_del_Po', 0.006], ['Teresa_Sapieha', 0.006], ['Teresa_Edwards', 0.006], ['Teresa_A._Dolan', 0.006], ['Teresa_Hurtado_de_Ory', 0.006], ['Teresa_De_Sio', 0.006], ['Teresa_Hsu_Chih', 0.006], ['Lady_Teresa_Waugh', 0.006], ['Teresa_Lourenco', 0.006], ['Teresa_Lubomirska', 0.006], ['Teresio_Maria_Languasco', 0.006], ['Teresa_Woo-Paw', 0.006], ['Teresa_de_Cartagena', 0.006], ['Teresa_Bernabe', 0.006], ['Teresa_Amabile', 0.006], ['Maria_Teresa,_Princess_of_Beira', 0.006], ['Teresa_Korwin_Gosiewska', 0.006], ['Teresa_Bright', 0.006], ['Teresa_Daly', 0.006], ['Teresa_Villaverde', 0.006], ['Teresa_Stich-Randall', 0.006], ['Teresa_Polias', 0.006], ['Teresa_Wong', 0.006], ['Teresa_Pavlinek', 0.006], ['Teresa_Ruiz_(politician)', 0.006], ['Teresa_Cooper', 0.006], ['Teresa_Carr_Deni', 0.006], ['Teresa_P._Pica', 0.006], ['Teresa_S._Polley', 0.006], ['Teresa_Stratas', 0.006], ['Teresa_Lipowska', 0.006], ['Teresa_Carpio', 0.006], ['Teresa_Stolz', 0.006], ['Teresa_Wilson', 0.006], ['Teresa_Lalor', 0.006], ['Teresa_Hannigan', 0.006], ['Teresa_Chodkiewicz', 0.006], ['Teresa_Lisbon', 0.006], ['Teresa_Forn', 0.006], ['Teresa_Gutierrez', 0.006], ['Teresa_Maxwell-Conover', 0.006], ['Teresa_Ann_Savoy', 0.006], ['Teresa_Trull', 0.006], ['Teresa_Forcades', 0.006], ['Teresa_Lynch', 0.006], ['Teresa_Furtado', 0.006], ['Teresa_Southwick', 0.006]]

Any help on understanding this would be very useful. Thanks!

YoungXiyuan commented 3 years ago

Sorry for my late reply and thank you for your interest in our work.

  1. As to your first question (the meaning of the with_coref and findcoref functions), I suggest you to refer to the paper Deep Joint Entity Disambiguation with Local Neural Attention_ (Please read the third paragraph in the Section 6 Candidate Selection).

  2. As to your second question about the Teresa example, my explanation is that any coreference method could introduce some loss, though it may introduce more accuracy.