dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920
Apache License 2.0
224 stars 50 forks source link

what should i do to get the whole entity vec in Wikipedia #13

Open herbertchen1 opened 6 years ago

herbertchen1 commented 6 years ago

not only the vector in the training data. Course you know, it's hard to predict which entities will appear in real life

octavian-ganea commented 6 years ago

Since I trained entity embeddings on GPU (see the lookup table here https://github.com/dalab/deep-ed/blob/master/entities/learn_e2v/model_a.lua#L28), I am afraid if one wants to get the full set of Wikipedia entities (i.e. 6M), one has to train them on CPU (which will be slower as far as I remember) and have enough RAM to keep a 6M x 300 lookup table. To do that, you have to modify the files in https://github.com/dalab/deep-ed/tree/master/entities/learn_e2v to use all Wikipedia entities and words for training. Setting the flag: -entities 'ALL' in entities/learn_e2v/learn_a.lua should do the job, but this code was not tested as far as I remember.

herbertchen1 commented 6 years ago

Thank you very much, i‘ll try it