dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920
Apache License 2.0
224 stars 50 forks source link

Assert error on entities/pretrained_e2v/e2v.lua #12

Closed shuyanzhou closed 6 years ago

shuyanzhou commented 6 years ago

Hi, After training embeddings, I try to train a global model. However, after start running, there occurs an assert error on line 21 of entities/pretrained_e2v/e2v.lua

assert(e2vutils.lookup[unk_ent_thid]:norm() == 0, e2vutils.lookup[unk_ent_thid]:norm())

I follow every step of the instruction without any modification.

Is there anything wrong?

Thanks a lot

octavian-ganea commented 6 years ago

The entity vector of the unk entity should have been 0. I am not sure why this happens (it has been a while since I touched this code), but it might not affect the scores in any way. The most important part is to get good scores on the entity relatedness task at the end of the training of entity embeddings.

shuyanzhou commented 6 years ago

Thank you for your reply. The scores on the entity relatedness task are similar to what you report.

Jorigorn commented 5 years ago

Hi, I got the same error here: CUDA_VISIBLE_DEVICES=5 th ed/ed.lua -root_data_dir $DATA_PATH -ent_vecs_filename $ENTITY_VECS -model 'local' same for gloabal model:

here is the log:

---> from t7 file: /home/zchen/experiments/deep-ed/deep_ed_data/generated/ent_name_id_map_RLTD.t7 Done loading entity name - wikiid. Size thid index = 276031 ==> Loading entity freq map Done loading entity freq index. Size = 266245 ==> Loading common w2v + top freq list of words ---> from t7 file. ==> Loading word freq map with unig power 0.6 Done loading word freq index. Num words = 491413; total freq = 774609376 ==> Loading w2v vectors ---> from t7 file. Done reading w2v data. Word vocab size = 491413 ==> Loading pre-trained entity vectors: e2v from file ent_vecs__ep_63.t7 /home/zchen/torch/install/bin/luajit: entities/pretrained_e2v/e2v.lua:21: 0.99999999253492 stack traceback: [C]: in function 'assert' entities/pretrained_e2v/e2v.lua:21: in main chunk [C]: in function 'dofile' ed/ed.lua:34: in main chunk [C]: in function 'dofile' ...chen/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

Jorigorn commented 5 years ago

@shuyanzhou May I ask how did you solve the issue? Thanks.

I commened line 21 ,28, now it started trainning.

To entity: Apple Inc.; vec norm = 1.0000000263353: TOP CLOSEST WORDS: Apple[fr=35085]{tf=332}[0.35]; software[fr=115171]{tf=24}[0.35]; computer[fr=128366]{tf=13}[0.34]; NeXTSTEP[fr=240]{tf=1}[0.34]; HyperCard[fr=305][0.33]; Microsoft[fr=40679]{tf=16}[0.32]; Adobe[fr=6703]{tf=1}[0.32]; hardware[fr=32185]{tf=9}[0.32]; Macintosh[fr=6896]{tf=23}[0.31]; Mac[fr=31244]{tf=33}[0.30]; Univac[fr=218][0.29]; computers[fr=34788]{tf=19}[0.29]; PC[fr=34763]{tf=3}[0.29]; desktop[fr=9072]{tf=6}[0.29]; Dell[fr=7971]{tf=3}[0.29]; iPod[fr=3966]{tf=21}[0.28]; company[fr=621699]{tf=52}[0.28]; laptop[fr=5520]{tf=1}[0.28]; WordStar[fr=217][0.28]; Intel[fr=13001]{tf=4}[0.28]; microcomputer[fr=776][0.28]; Flash[fr=14093][0.28]; minicomputer[fr=470][0.27]; HP[fr=10165]{tf=1}[0.27]; G5[fr=777][0.27]; product[fr=119487]{tf=13}[0.27]; PCs[fr=4188][0.27]; Windows[fr=53613]{tf=5}[0.27]; app[fr=7812]{tf=4}[0.26]; Computers[fr=3346][0.26]; Computer[fr=35170]{tf=6}[0.26]; Epyx[fr=220][0.26]; Digital[fr=38691][0.26]; Multimedia[fr=3433][0.26]; device[fr=67702]{tf=11}[0.25]; digital[fr=95324]{tf=10}[0.25]; Toshiba[fr=1958][0.25]; OS[fr=21986]{tf=25}[0.25]; licensing[fr=12988][0.25]; Jobs[fr=4126]{tf=50}[0.25]; iMac[fr=635]{tf=12}[0.25]; version[fr=365578]{tf=12}[0.25];

WORDS NOT FOUND: iTunes{15}; market{24}; operating{10}; iPhone{51}; October{22}; processor{10}; July{12}; iPad{32}; employees{12}; users{10}; logo{10}; Store{18}; January{18}; introduced{29}; personal{10}; CEO{12}; announced{33}; time{16}; media{14}; products{26}; August{10}; sales{10}; released{17}; China{10}; price{11}; sold{19}; companies{16}; launch{11}; iOS{11}; technology{10}; use{12}; million{20}; November{11}; Steve{22}; June{15}; billion{16}; April{11}; March{10}; Cook{13}; world{14}; music{11}; TV{15}; user{13}; share{10}; line{12}; video{15};

/============================================================================ Entity Relatedness quality measure: measure = NDCG1 NDCG5 NDCG10 MAP TOTAL VALIDATION our (vald) = 0.673 0.635 0.669 0.613 2.590 our (test) = 0.637 0.604 0.635 0.573 Yamada'16 = 0.59 0.56 0.59 0.52 WikiMW = 0.54 0.52 0.55 0.48

Done testing for GLB Params serialized = model=global

One epoch = 5 full passes over AIDA-TRAIN in our case. ==> TRAINING EPOCH # 1 <==

Network norms of parameter weights : A (attention mat) = 17.320508075689 B (ctxt embedding) = 17.320508075689 C (pairwise mat) = 17.320508075689 f_network norm = 5.6794408493519 4.1886493609042 0.58306589612436 0.080378144979477 [........................................ 15271/100000000 .............................] ETA: 4D11h | Step: 3ms