Closed h-oll closed 1 year ago
Hi,
Merging of the 3 indexes happens in https://github.com/dalab/deep-ed/blob/master/data_gen/indexes/yago_crosswikis_wiki.lua and https://github.com/dalab/deep-ed/tree/master/data_gen/gen_p_e_m . We added frequencies from Wiki and Crosswikis, while for Yago we directly added the corresponding uniform probability (since this dataset only comes in the form of lists of possible entities for a given mention, without any frequencies or probabilities).
Thanks for your reply. The questions originated from reading the code in the two files you mentioned so I was apparently looking at the right spot.
I couldn't however come back to the original datasets on crosswikis and yago. Would there be a link / package similar to what you used to get the canonical pages with the text + Wikipedia anchors only ?
Best.
Hi,
I am not sure if you are interested only in the Wiki anchors, but if you are, please check https://github.com/dalab/deep-ed/issues/18.
Crosswikis is available here: https://nlp.stanford.edu/data/crosswikis-data.tar.bz2/ and here https://ai.googleblog.com/2012/05/from-words-to-concepts-and-back.html.
Yago is taken from: https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/ambiverse-nlu/aida/downloads/ . It is the aida_means.tsv.bz2 file.
Great thanks. I had found #18 already.
Hi,
thanks for open-sourcing the code of your paper. Reading section 3, I was expecting the various conditional probabilities p(e|m) to be computed from the wiki canonical pages and the context around hyperlinks. Reading the code, I think it also takes into account precomputed values from crosswikis, and yago (and actually being in line with section 6).
Am I reading the paper and code correctly ?
If yes, would you mind giving some more details on how crosswikis freqs and yago probs are computed + how you combine them (it looks like crosswikis and wiki freqs are added while yago prob. is added to wiki+crosswikis proba unless the sum is > 1, and in such case it is trunctaed).
Best