dalab / deep-ed

Source code for the EMNLP'17 paper "Deep Joint Entity Disambiguation with Local Neural Attention", https://arxiv.org/abs/1704.04920
Apache License 2.0
223 stars 50 forks source link

Did anyone manage to reproduce the Entity Relatedness quality measure ? #9

Closed se4u closed 6 years ago

se4u commented 6 years ago

Acc. to the readme the entity embeddings should reach upto 0.681 NDCG1 after 69 epochs. I have already done 75 epochs but the performance is much below the target. Did anyone else encounter this ?

============================================================================               
Entity Relatedness quality measure:                                                        
measure    =    NDCG1   NDCG5   NDCG10  MAP     TOTAL VALIDATION                           
our (vald) =    0.660   0.617   0.651   0.594   2.522                                      
our (test) =    0.623   0.590   0.616   0.552                                              
Yamada'16  =    0.59    0.56    0.59    0.52                                               
WikiMW     =    0.54    0.52    0.55    0.48                                               
==> saving model to ~/deep-ed-data/generated/ent_vecs/ent_vecs__ep_75.t7
octavian-ganea commented 6 years ago

There is some variability depending on initialization of the embeddings, but I agree that in your case the difference is big. I ran this code 3 times and it was always able to get results similar with the ones mentioned in the readme after less than 100 epochs. Did you change something in the code ? Are you training with both canonical pages and hyperlinks ? Can you show what happens if you let it train longer (100 epochs or so) and pick the epoch with the highest TOTAL Validation score ?

se4u commented 6 years ago

I did not change anything in the training code and my fork is also on github

The command I ran to train was just the vanilla command. I did not specify any other flags. Here's the command the important part from the log file

th entities/learn_e2v/learn_a.lua -root_data_dir $(DP)

Here's a part of the log output with the important info

==> Loading relatedness test
  ---> from t7 file.
==> Loading relatedness thid tensor
  ---> from t7 file.
    Done loading relatedness sets. Num queries test = 3319. Num queries valid = 3673. Total num ents restricted set = 276031
==> Loading entity wikiid - name map
  ---> from t7 file: /export/c02/prastog3/deep-ed-data/generated/ent_name_id_map_RLTD.t7
    Done loading entity name - wikiid. Size thid index = 276031
==> Loading common w2v + top freq list of words
  ---> t7 file NOT found. Loading from disk instead (slower). Out file = /export/c02/prastog3/deep-ed-data/generated/common_top_words_freq_vectors_w2v.t7
   word freq index ...
   word vectors index ...

==> Loading word freq map with unig power ^[[31m0.6^[[39m
    Done loading word freq index. Num words = 491413; total freq = 774609376
==> Loading w2v vectors
  ---> t7 file NOT found. Loading w2v from the bin/txt file instead (slower).
Num words = 491413. Num phrases = 0
Writing t7 File for future usage. Next time Word2Vec loading will be faster!
    Done reading w2v data. Word vocab size = 491413

==> Init entity embeddings matrix. Num ents = 276031
Init entity embeddings with average of title word vectors to speed up learning.
    Done init.

Training entity vectors w/ params: ;obj-maxm;wiki-canonical-hyperlinks;hypCtxtL-10;numWWpass-400;WperE-20;w2v;negW-5;ents-RLTD;unigP-0.6;bs-500;ADAGRAD-lr-0.3

My files in the generated folder also match the description in the README

prastog3@c02 /export/c02/prastog3/deep-ed-data/generated Wed Jan 03 18:10:04   
$ ls -lah                                                                      
total 148G                                                                     
 4.0K  .                                                                       
 4.0K  ..                                                                      
 9.5M  all_candidate_ents_ed_rltd_datasets_RLTD.t7                             
  13M  common_top_words_freq_vectors_w2v.t7                                    
 775M  crosswikis_wikipedia_p_e_m.txt                                          
 5.0M  empty_page_ents.txt                                                     
  22M  ent_name_id_map_RLTD.t7                                                 
 520M  ent_name_id_map.t7                                                      
 4.0K  ent_vecs                                                                
  95M  ent_wiki_freq.txt                                                       
 563M  GoogleNews-vectors-negative300.t7                                       
 7.3M  relatedness_test.t7                                                     
 8.9M  relatedness_validate.t7                                                 
 4.0K  test_train_data                                                         
 1.5G  wiki_canonical_words_RLTD.txt                                           
 8.4G  wiki_canonical_words.txt                                                
  88G  wiki_hyperlink_contexts.csv                                             
  48G  wiki_hyperlink_contexts_RLTD.csv                                        
 329M  wikipedia_p_e_m.txt                                                     
  11M  word_wiki_freq.txt                                                      
 749M  yago_p_e_m.txt                                                          

A few questions

Is there a easy way to initialize the training from pre-trained embeddings? My job got killed on at epoch 89 after 48 hours. Till then I only got to maximum 0.661 validation NDCG1.

Just to be sure the code does not use fixed rng seeds right ? I will start another job with a different random seed maybe my run was just unlucky.

octavian-ganea commented 6 years ago

It might be a due to the variability in the initalization. Can you please re-run this once again and let me know what you get after 100 epochs ? If you still get the same, I will take a closer look. Here is my log file for training entity vectors https://drive.google.com/file/d/12jK2XtynM_ndsDlOEGA4amkDryIY_WIP/view?usp=sharing

Can you show here what test and validation scores you get in the first 5 epochs ?

Regarding some specific initialization: the current code initializes each entity vector using the average of the embeddings of entity's title words. See https://github.com/dalab/deep-ed/blob/master/entities/learn_e2v/model_a.lua#L35 . If you want a different initalization, you would have to modify this code.

se4u commented 6 years ago

Thanks a lot for the log file that was helpful. I plotted the validation scores vs epochs that you observed and that I got in my two trials. I am guessing that at epoch 54 the torch optim that you were using lowered the learning rate but that did not happen for me and I got unlucky in my first trial.

I will update later tonight once my second run finishes.

image

Code for producing the plot for future.

%matplotlib inline 
! fgrep -A 8 'Entity Relatedness quality measure' log_train_entity_vecs | fgrep vald | awk '{print $4}' > rastogi.val.trial2.ndcg1
! fgrep -A 8 'Entity Relatedness quality measure' log_train_entity_vecs.bak | fgrep vald | awk '{print $4}' > rastogi.val.trial1.ndcg1
! fgrep -A 8 'Entity Relatedness quality measure' log_train_entity_vecs.log  | fgrep vald | awk '{print $4}' > ganea.val.ndcg1
from matplotlib.pyplot import *
import numpy as np
def sanitize(e):
    return float(e.strip().replace('\x1b[34m', '').replace('\x1b[39m', ''))
g = np.array([sanitize(e.strip()) for e in open('ganea.val.ndcg1')])
r1 = np.array([sanitize(e.strip()) for e in open('rastogi.val.trial1.ndcg1')])
r2 = np.array([sanitize(e.strip()) for e in open('rastogi.val.trial2.ndcg1')])
figure(figsize=(15,8))
plot(g, 'ko-', label='ganea')
plot(r1, 'ro-', label='rastogi-try1')
plot(r2, 'bo-', label='rastogi-try2')
legend()
grid('on')
octavian-ganea commented 6 years ago

If you look in my logfile, after epoch 55 the following message is printed: "Start training on Wiki Hyperlinks"

This means that before ep 55, only Wiki canonical pages are used for training, and after that the training uses hyperlinks around mentions of entities. This explains the accuracy gap. Do you have this message in your log file ? It corresponds to: https://github.com/dalab/deep-ed/blob/master/entities/learn_e2v/batch_dataset_a.lua#L28

octavian-ganea commented 6 years ago

If the above message was not logged, then most likely the training did not reach the hyperlinks stage. It might be the case that i accidentally modified the default value of --num_passes_wiki_words from 200 to 400 at some point, in which case rerunning with 200 should work.

se4u commented 6 years ago

Aha, yes that seems to be it. My log for trial1 https://drive.google.com/file/d/1pf0WB-L13wpq0EY_Oi_jFtQzizcyJtZc/view?usp=sharing shows that I did not train on the Wiki Hyperlinks and the default value of num_passes_wiki_words is 400 in the repo. I have restarted the job and will report back once it finishes.

octavian-ganea commented 6 years ago

If the message "Start training on Wiki Hyperlinks" is not in your logs, then that is likely the case.

octavian-ganea commented 6 years ago

Did it work ?

se4u commented 6 years ago

Yes it worked. Sorry I should have closed the issue right away.

octavian-ganea commented 6 years ago

Perfect, glad it's fine :)