dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920
Apache License 2.0
223 stars 50 forks source link

crosswikis and yago #24

Closed h-oll closed 1 year ago

h-oll commented 5 years ago

Hi,

thanks for open-sourcing the code of your paper. Reading section 3, I was expecting the various conditional probabilities p(e|m) to be computed from the wiki canonical pages and the context around hyperlinks. Reading the code, I think it also takes into account precomputed values from crosswikis, and yago (and actually being in line with section 6).

Am I reading the paper and code correctly ?

If yes, would you mind giving some more details on how crosswikis freqs and yago probs are computed + how you combine them (it looks like crosswikis and wiki freqs are added while yago prob. is added to wiki+crosswikis proba unless the sum is > 1, and in such case it is trunctaed).

Best

octavian-ganea commented 5 years ago

Hi,

Merging of the 3 indexes happens in https://github.com/dalab/deep-ed/blob/master/data_gen/indexes/yago_crosswikis_wiki.lua and https://github.com/dalab/deep-ed/tree/master/data_gen/gen_p_e_m . We added frequencies from Wiki and Crosswikis, while for Yago we directly added the corresponding uniform probability (since this dataset only comes in the form of lists of possible entities for a given mention, without any frequencies or probabilities).

h-oll commented 5 years ago

Thanks for your reply. The questions originated from reading the code in the two files you mentioned so I was apparently looking at the right spot.

I couldn't however come back to the original datasets on crosswikis and yago. Would there be a link / package similar to what you used to get the canonical pages with the text + Wikipedia anchors only ?

Best.

octavian-ganea commented 5 years ago

Hi,

I am not sure if you are interested only in the Wiki anchors, but if you are, please check https://github.com/dalab/deep-ed/issues/18.

Crosswikis is available here: https://nlp.stanford.edu/data/crosswikis-data.tar.bz2/ and here https://ai.googleblog.com/2012/05/from-words-to-concepts-and-back.html.

Yago is taken from: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/ambiverse-nlu/aida/downloads/ . It is the aida_means.tsv.bz2 file.

h-oll commented 5 years ago

Great thanks. I had found #18 already.