allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

Vanishing/Exploiting gradients ? #168

Closed agemagician closed 5 years ago

agemagician commented 5 years ago

Hi,

I am training in a new corpus/language and during training it shows good perplexity but at some point the perplexity jumped so high. It was something between 7-10, and then it jumped to 15699 Is there any side effect of vanishing/exploiting gradients you have seen with this model ?

I am using the default setting except I have changed the unroll_steps to 50 and the vocab size to fit my problem.

Loading data from: /home/ga53huq2/projects/bio/unsupervised_amino_acid_embedding/dataset/file9115.txt                                                                                                                                                               [803/1946]
Loaded 500000 sentences.
Finished loading
Batch 673300, train_perplexity=6.99748
Total time: 1041813.8715116978
Batch 673400, train_perplexity=7.26806
Total time: 1041936.777597189
Batch 673500, train_perplexity=7.09751
Total time: 1042059.83791399
Batch 673600, train_perplexity=6.75356
Total time: 1042182.6843624115
Batch 673700, train_perplexity=7.06632
Total time: 1042305.2833411694
WARNING:tensorflow:Error encountered when serializing lstm_output_embeddings.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'list' object has no attribute 'name'
Batch 673800, train_perplexity=6.99326
Total time: 1042454.4103929996
Batch 673900, train_perplexity=7.05139
Total time: 1042577.2035264969
Batch 674000, train_perplexity=7.0321
Total time: 1042700.0349740982
Batch 674100, train_perplexity=6.96487
Total time: 1042822.9305524826
Loading data from: /home/ga53huq2/projects/bio/unsupervised_amino_acid_embedding/dataset/file9218.txt
Loaded 500000 sentences.
Finished loading
Batch 674200, train_perplexity=8.39033
Total time: 1043664.7385673523
Batch 674300, train_perplexity=7.96308
Total time: 1043785.5559742451
Batch 674400, train_perplexity=8.09743
Total time: 1043906.0904071331
Batch 674500, train_perplexity=8.30678
Total time: 1044027.1834490299
Batch 674600, train_perplexity=7.90479
Total time: 1044148.736148119
Batch 674700, train_perplexity=8.11415
Total time: 1044270.838372469
Batch 674800, train_perplexity=10.2219
Total time: 1044392.4975090027
**Batch 674900, train_perplexity=15699.9
Total time: 1044514.349034071
Batch 675000, train_perplexity=90.8147
Total time: 1044636.6984901428**
WARNING:tensorflow:Error encountered when serializing lstm_output_embeddings.
Type is unsupported, or the types of the items don't match field type in CollectionDef.
'list' object has no attribute 'name'
Batch 675100, train_perplexity=13.7765
Total time: 1044782.6224155426
Batch 675200, train_perplexity=13.6507
Total time: 1044903.5013656616
Batch 675300, train_perplexity=13.4189
Total time: 1045024.7266552448
Batch 675400, train_perplexity=12.9115
Total time: 1045146.2300972939
Batch 675500, train_perplexity=13.3054
Total time: 1045267.7772848606
Batch 675600, train_perplexity=12.9654
Total time: 1045389.3125679493
Batch 675700, train_perplexity=12.879
Total time: 1045511.3902630806
Batch 675800, train_perplexity=12.5727
Total time: 1045633.0763440132
Batch 675900, train_perplexity=12.4994
Total time: 1045754.6204769611
Batch 676000, train_perplexity=12.0309
Total time: 1045876.4980325699
Batch 676100, train_perplexity=12.4572
Total time: 1045998.2082695961
Batch 676200, train_perplexity=12.3322
Total time: 1046119.7888958454
PhilipMay commented 5 years ago

Maybe you have some strange and not homogeneous training data.