YoungXiyuan / DCA

This repository contains code used in the EMNLP 2019 paper "Learning Dynamic Context Augmentation for Global Entity Linking".
https://arxiv.org/abs/1909.02117
45 stars 15 forks source link

Maybe ent_inlink mistakes? #2

Open ZacharyChenpk opened 4 years ago

ZacharyChenpk commented 4 years ago

Hi! When I used your data, I found that some entities which should be related, such as 'Cambodia_national_football_team' and 'Football_Federation_of_Cambodia' have no link to each other, but they both have link with 'Shrewsbury,_Pennsylvania'. That's strange, and I found more strange links and un-links when I tried to build a graph with them. I used the entityid_dictid_inlinks_uniq.pkl. I assumed that the dict means a relationship with an entity whose id is the key and an entity whose id is in the values. Had I made a mistake, or the data?

YoungXiyuan commented 4 years ago

I remember that the "entityid_dictid_inlinks_uniq.pkl" should be a dict in which the key is an entity id (corresponding to one Wikipedia Page), while the value is a list of non-repeating inlinks in that wiki page. (See code line # 318 in "mulrel_ranker.py")

ZacharyChenpk commented 4 years ago

I have read the code, and I found that the function "compute_coherence" put its value into "self.entity_embeddings", and therefore I assume that the values can be seen as the ID of other entities which related to the key one. The problem is that I cannot find any reasonable links even between the candidates in one document, and many of the links seem unreasonable.

YoungXiyuan commented 4 years ago

“The problem is that I cannot find any reasonable links even between the candidates in one document, and many of the links seem unreasonable”

Sorry...Could you please show me a concrete example? Because I forget a lot about this project, and I am a little confused about the term "the candidates in one document"...

ZacharyChenpk commented 4 years ago

For example, "Cambodia national football team" and "Football Federation of Cambodia" have no link between each other, but they both have link with "2011 State of the Union Address", which is unreasonable. By "the candidates in one document", I meant that when I tried to put the all candidates(entities) of all mentions in a document into one graph, there should be some links between these entities, because there are entities under the same topics or domains, and they should be linked.

YoungXiyuan commented 4 years ago

1) I am a little curious about the resource of your listed three entities, are they all candidates of different mentions in the same doc? Or some are candidates while some are just inlinked entities of different candidates of different mentions in the same doc?

2) This is a very interesting try! And I have 6 points to claim: a) The candidates are not generated by us, we just download them from the previous work.

 b) Maybe there are some indirect and potential links between these candidates, not direct links.

 c) Maybe there are some docs whose mentions don't share a unified topic.

 d) In the coherence computation, the usage of candidates and their inlinked entities is to retrieve their corresponding entity embeddings. So to some degree meaning, whether two entities are related to each  other or not, is essentially determined by their distance in vector space, not their physical links (:

  e) In our paper, we accumulate previous linked entities to create one kind of a "temporary topic" which exists in vector space, then candidates are preferred whose vectors are close to that topic. So maybe that "temporary topic" can't be expressed in human language.

  f) To get "entityid_dictid_inlinks_uniq.pkl", we adopt JWPL (Java Wikipedia Library)  to process Wikipedia dumps. May there are some defects in that tool.

All in all, I have to say that the visualization try of "entityid_dictid_inlinks_uniq.pkl" is a extremely interesting experiment. I truly hope you can find some good ideas to improve the performance of DCA system (:

theblackcat102 commented 3 years ago

Hi @YoungXiyuan, since you mention the entityid_dictid_inlinks_uniq.pkl is an entity id mapped to a list of non-repeating inlinks in that wiki page. So does the list of ids refers to a list of entity ids or list of word ids?

YoungXiyuan commented 3 years ago

@theblackcat102 According to my impression, the list of ids should refer to a list of entity ids.