Closed se4u closed 6 years ago
There is some variability depending on initialization of the embeddings, but I agree that in your case the difference is big. I ran this code 3 times and it was always able to get results similar with the ones mentioned in the readme after less than 100 epochs. Did you change something in the code ? Are you training with both canonical pages and hyperlinks ? Can you show what happens if you let it train longer (100 epochs or so) and pick the epoch with the highest TOTAL Validation score ?
I did not change anything in the training code and my fork is also on github
The command I ran to train was just the vanilla command. I did not specify any other flags. Here's the command the important part from the log file
th entities/learn_e2v/learn_a.lua -root_data_dir $(DP)
Here's a part of the log output with the important info
==> Loading relatedness test
---> from t7 file.
==> Loading relatedness thid tensor
---> from t7 file.
Done loading relatedness sets. Num queries test = 3319. Num queries valid = 3673. Total num ents restricted set = 276031
==> Loading entity wikiid - name map
---> from t7 file: /export/c02/prastog3/deep-ed-data/generated/ent_name_id_map_RLTD.t7
Done loading entity name - wikiid. Size thid index = 276031
==> Loading common w2v + top freq list of words
---> t7 file NOT found. Loading from disk instead (slower). Out file = /export/c02/prastog3/deep-ed-data/generated/common_top_words_freq_vectors_w2v.t7
word freq index ...
word vectors index ...
==> Loading word freq map with unig power ^[[31m0.6^[[39m
Done loading word freq index. Num words = 491413; total freq = 774609376
==> Loading w2v vectors
---> t7 file NOT found. Loading w2v from the bin/txt file instead (slower).
Num words = 491413. Num phrases = 0
Writing t7 File for future usage. Next time Word2Vec loading will be faster!
Done reading w2v data. Word vocab size = 491413
==> Init entity embeddings matrix. Num ents = 276031
Init entity embeddings with average of title word vectors to speed up learning.
Done init.
Training entity vectors w/ params: ;obj-maxm;wiki-canonical-hyperlinks;hypCtxtL-10;numWWpass-400;WperE-20;w2v;negW-5;ents-RLTD;unigP-0.6;bs-500;ADAGRAD-lr-0.3
My files in the generated
folder also match the description in the README
prastog3@c02 /export/c02/prastog3/deep-ed-data/generated Wed Jan 03 18:10:04
$ ls -lah
total 148G
4.0K .
4.0K ..
9.5M all_candidate_ents_ed_rltd_datasets_RLTD.t7
13M common_top_words_freq_vectors_w2v.t7
775M crosswikis_wikipedia_p_e_m.txt
5.0M empty_page_ents.txt
22M ent_name_id_map_RLTD.t7
520M ent_name_id_map.t7
4.0K ent_vecs
95M ent_wiki_freq.txt
563M GoogleNews-vectors-negative300.t7
7.3M relatedness_test.t7
8.9M relatedness_validate.t7
4.0K test_train_data
1.5G wiki_canonical_words_RLTD.txt
8.4G wiki_canonical_words.txt
88G wiki_hyperlink_contexts.csv
48G wiki_hyperlink_contexts_RLTD.csv
329M wikipedia_p_e_m.txt
11M word_wiki_freq.txt
749M yago_p_e_m.txt
A few questions
Is there a easy way to initialize the training from pre-trained embeddings? My job got killed on at epoch 89 after 48 hours. Till then I only got to maximum 0.661
validation NDCG1.
Just to be sure the code does not use fixed rng seeds right ? I will start another job with a different random seed maybe my run was just unlucky.
It might be a due to the variability in the initalization. Can you please re-run this once again and let me know what you get after 100 epochs ? If you still get the same, I will take a closer look. Here is my log file for training entity vectors https://drive.google.com/file/d/12jK2XtynM_ndsDlOEGA4amkDryIY_WIP/view?usp=sharing
Can you show here what test and validation scores you get in the first 5 epochs ?
Regarding some specific initialization: the current code initializes each entity vector using the average of the embeddings of entity's title words. See https://github.com/dalab/deep-ed/blob/master/entities/learn_e2v/model_a.lua#L35 . If you want a different initalization, you would have to modify this code.
Thanks a lot for the log file that was helpful. I plotted the validation scores vs epochs that you observed and that I got in my two trials. I am guessing that at epoch 54 the torch optim that you were using lowered the learning rate but that did not happen for me and I got unlucky in my first trial.
I will update later tonight once my second run finishes.
Code for producing the plot for future.
%matplotlib inline
! fgrep -A 8 'Entity Relatedness quality measure' log_train_entity_vecs | fgrep vald | awk '{print $4}' > rastogi.val.trial2.ndcg1
! fgrep -A 8 'Entity Relatedness quality measure' log_train_entity_vecs.bak | fgrep vald | awk '{print $4}' > rastogi.val.trial1.ndcg1
! fgrep -A 8 'Entity Relatedness quality measure' log_train_entity_vecs.log | fgrep vald | awk '{print $4}' > ganea.val.ndcg1
from matplotlib.pyplot import *
import numpy as np
def sanitize(e):
return float(e.strip().replace('\x1b[34m', '').replace('\x1b[39m', ''))
g = np.array([sanitize(e.strip()) for e in open('ganea.val.ndcg1')])
r1 = np.array([sanitize(e.strip()) for e in open('rastogi.val.trial1.ndcg1')])
r2 = np.array([sanitize(e.strip()) for e in open('rastogi.val.trial2.ndcg1')])
figure(figsize=(15,8))
plot(g, 'ko-', label='ganea')
plot(r1, 'ro-', label='rastogi-try1')
plot(r2, 'bo-', label='rastogi-try2')
legend()
grid('on')
If you look in my logfile, after epoch 55 the following message is printed: "Start training on Wiki Hyperlinks"
This means that before ep 55, only Wiki canonical pages are used for training, and after that the training uses hyperlinks around mentions of entities. This explains the accuracy gap. Do you have this message in your log file ? It corresponds to: https://github.com/dalab/deep-ed/blob/master/entities/learn_e2v/batch_dataset_a.lua#L28
If the above message was not logged, then most likely the training did not reach the hyperlinks stage. It might be the case that i accidentally modified the default value of --num_passes_wiki_words from 200 to 400 at some point, in which case rerunning with 200 should work.
Aha, yes that seems to be it. My log for trial1 https://drive.google.com/file/d/1pf0WB-L13wpq0EY_Oi_jFtQzizcyJtZc/view?usp=sharing shows that I did not train on the Wiki Hyperlinks and the default value of num_passes_wiki_words
is 400
in the repo. I have restarted the job and will report back once it finishes.
If the message "Start training on Wiki Hyperlinks" is not in your logs, then that is likely the case.
Did it work ?
Yes it worked. Sorry I should have closed the issue right away.
Perfect, glad it's fine :)
Acc. to the readme the entity embeddings should reach upto
0.681
NDCG1 after69
epochs. I have already done75
epochs but the performance is much below the target. Did anyone else encounter this ?