Closed todpole3 closed 6 years ago
FYI, the ConvE model also behaves quite differently using these two different data splits. I haven't finished testing the other models yet.
In short, using your split dev MRR 84.3, test MRR 85.0 using the other split dev MRR 80.8, test MRR 45.9
The other data split is released here: https://github.com/shehzaadzd/MINERVA/tree/master/datasets/data_preprocessed/kinship
The complete model output is shown below. Using your split:
2018-03-20 19:57:01.156961 (INFO): COMPLETED EPOCH: 1000
2018-03-20 19:57:01.156992 (INFO): train Loss: 0.053774 99% CI: (0.05242, 0.055129), n=78
2018-03-20 19:57:01.157016 (INFO): ########################################
2018-03-20 19:57:01.157038 (INFO):
saving to saved_models/kinship_ConvE_0.2_0.3.model
2018-03-20 19:57:01.379290 (INFO):
2018-03-20 19:57:01.379504 (INFO): --------------------------------------------------
2018-03-20 19:57:01.379868 (INFO): dev_evaluation
2018-03-20 19:57:01.379898 (INFO): --------------------------------------------------
2018-03-20 19:57:01.379923 (INFO):
2018-03-20 19:57:02.019596 (INFO): Hits left @1: 0.681640625
2018-03-20 19:57:02.019995 (INFO): Hits right @1: 0.849609375
2018-03-20 19:57:02.020297 (INFO): Hits @1: 0.765625
2018-03-20 19:57:02.020552 (INFO): Hits left @2: 0.8017578125
2018-03-20 19:57:02.020650 (INFO): Hits right @2: 0.9326171875
2018-03-20 19:57:02.020762 (INFO): Hits @2: 0.8671875
2018-03-20 19:57:02.020847 (INFO): Hits left @3: 0.8671875
2018-03-20 19:57:02.020931 (INFO): Hits right @3: 0.95703125
2018-03-20 19:57:02.021039 (INFO): Hits @3: 0.912109375
2018-03-20 19:57:02.021122 (INFO): Hits left @4: 0.888671875
2018-03-20 19:57:02.021202 (INFO): Hits right @4: 0.97265625
2018-03-20 19:57:02.021314 (INFO): Hits @4: 0.9306640625
2018-03-20 19:57:02.021399 (INFO): Hits left @5: 0.90625
2018-03-20 19:57:02.021480 (INFO): Hits right @5: 0.9736328125
2018-03-20 19:57:02.021591 (INFO): Hits @5: 0.93994140625
2018-03-20 19:57:02.021674 (INFO): Hits left @6: 0.9189453125
2018-03-20 19:57:02.021761 (INFO): Hits right @6: 0.9775390625
2018-03-20 19:57:02.021870 (INFO): Hits @6: 0.9482421875
2018-03-20 19:57:02.021951 (INFO): Hits left @7: 0.93359375
2018-03-20 19:57:02.022037 (INFO): Hits right @7: 0.98046875
2018-03-20 19:57:02.022144 (INFO): Hits @7: 0.95703125
2018-03-20 19:57:02.022252 (INFO): Hits left @8: 0.94140625
2018-03-20 19:57:02.022344 (INFO): Hits right @8: 0.9814453125
2018-03-20 19:57:02.022457 (INFO): Hits @8: 0.96142578125
2018-03-20 19:57:02.022540 (INFO): Hits left @9: 0.9453125
2018-03-20 19:57:02.022620 (INFO): Hits right @9: 0.9833984375
2018-03-20 19:57:02.022729 (INFO): Hits @9: 0.96435546875
2018-03-20 19:57:02.022811 (INFO): Hits left @10: 0.947265625
2018-03-20 19:57:02.022899 (INFO): Hits right @10: 0.9833984375
2018-03-20 19:57:02.023015 (INFO): Hits @10: 0.96533203125
2018-03-20 19:57:02.023182 (INFO): Mean rank left: 3.4677734375
2018-03-20 19:57:02.023342 (INFO): Mean rank right: 2.32421875
2018-03-20 19:57:02.023599 (INFO): Mean rank: 2.89599609375
2018-03-20 19:57:02.023783 (INFO): Mean reciprocal rank left: 0.7804370418914588
2018-03-20 19:57:02.023973 (INFO): Mean reciprocal rank right: 0.9051783141074977
2018-03-20 19:57:02.024372 (INFO): Mean reciprocal rank: 0.8428076779994782
2018-03-20 19:57:02.024722 (INFO):
2018-03-20 19:57:02.024760 (INFO): --------------------------------------------------
2018-03-20 19:57:02.024784 (INFO): test_evaluation
2018-03-20 19:57:02.024800 (INFO): --------------------------------------------------
2018-03-20 19:57:02.024936 (INFO):
2018-03-20 19:57:02.677243 (INFO): Hits left @1: 0.6767578125
2018-03-20 19:57:02.677527 (INFO): Hits right @1: 0.8720703125
2018-03-20 19:57:02.677858 (INFO): Hits @1: 0.7744140625
2018-03-20 19:57:02.677967 (INFO): Hits left @2: 0.8046875
2018-03-20 19:57:02.678057 (INFO): Hits right @2: 0.9423828125
2018-03-20 19:57:02.678190 (INFO): Hits @2: 0.87353515625
2018-03-20 19:57:02.678559 (INFO): Hits left @3: 0.8642578125
2018-03-20 19:57:02.678649 (INFO): Hits right @3: 0.9619140625
2018-03-20 19:57:02.678759 (INFO): Hits @3: 0.9130859375
2018-03-20 19:57:02.678840 (INFO): Hits left @4: 0.8857421875
2018-03-20 19:57:02.678926 (INFO): Hits right @4: 0.9794921875
2018-03-20 19:57:02.679032 (INFO): Hits @4: 0.9326171875
2018-03-20 19:57:02.679113 (INFO): Hits left @5: 0.9072265625
2018-03-20 19:57:02.679193 (INFO): Hits right @5: 0.98828125
2018-03-20 19:57:02.679305 (INFO): Hits @5: 0.94775390625
2018-03-20 19:57:02.679385 (INFO): Hits left @6: 0.919921875
2018-03-20 19:57:02.679466 (INFO): Hits right @6: 0.990234375
2018-03-20 19:57:02.679570 (INFO): Hits @6: 0.955078125
2018-03-20 19:57:02.679652 (INFO): Hits left @7: 0.9306640625
2018-03-20 19:57:02.679732 (INFO): Hits right @7: 0.9931640625
2018-03-20 19:57:02.679838 (INFO): Hits @7: 0.9619140625
2018-03-20 19:57:02.679916 (INFO): Hits left @8: 0.9345703125
2018-03-20 19:57:02.680004 (INFO): Hits right @8: 0.9931640625
2018-03-20 19:57:02.680109 (INFO): Hits @8: 0.9638671875
2018-03-20 19:57:02.680194 (INFO): Hits left @9: 0.939453125
2018-03-20 19:57:02.680273 (INFO): Hits right @9: 0.994140625
2018-03-20 19:57:02.680380 (INFO): Hits @9: 0.966796875
2018-03-20 19:57:02.680459 (INFO): Hits left @10: 0.947265625
2018-03-20 19:57:02.680541 (INFO): Hits right @10: 0.994140625
2018-03-20 19:57:02.680646 (INFO): Hits @10: 0.970703125
2018-03-20 19:57:02.680813 (INFO): Mean rank left: 3.2685546875
2018-03-20 19:57:02.680970 (INFO): Mean rank right: 1.5263671875
2018-03-20 19:57:02.681224 (INFO): Mean rank: 2.3974609375
2018-03-20 19:57:02.681519 (INFO): Mean reciprocal rank left: 0.7783240016423743
2018-03-20 19:57:02.681789 (INFO): Mean reciprocal rank right: 0.9209283730768105
2018-03-20 19:57:02.682173 (INFO): Mean reciprocal rank: 0.8496261873595925
Using the other split:
2018-03-20 22:07:19.726339 (INFO): COMPLETED EPOCH: 1000
2018-03-20 22:07:19.726371 (INFO): train Loss: 0.054711 99% CI: (0.053405, 0.056018), n=78
2018-03-20 22:07:19.726395 (INFO): ########################################
2018-03-20 22:07:19.726421 (INFO):
saving to saved_models/kinship_ConvE_ConvE_0.2_0.3.model
2018-03-20 22:07:19.953752 (INFO):
2018-03-20 22:07:19.954090 (INFO): --------------------------------------------------
2018-03-20 22:07:19.954282 (INFO): dev_evaluation
2018-03-20 22:07:19.954428 (INFO): --------------------------------------------------
2018-03-20 22:07:19.954455 (INFO):
2018-03-20 22:07:20.608406 (INFO): Hits left @1: 0.52734375
2018-03-20 22:07:20.608696 (INFO): Hits right @1: 0.8759765625
2018-03-20 22:07:20.609177 (INFO): Hits @1: 0.70166015625
2018-03-20 22:07:20.609434 (INFO): Hits left @2: 0.7373046875
2018-03-20 22:07:20.609540 (INFO): Hits right @2: 0.9560546875
2018-03-20 22:07:20.609666 (INFO): Hits @2: 0.8466796875
2018-03-20 22:07:20.609760 (INFO): Hits left @3: 0.83203125
2018-03-20 22:07:20.609852 (INFO): Hits right @3: 0.9765625
2018-03-20 22:07:20.609972 (INFO): Hits @3: 0.904296875
2018-03-20 22:07:20.610062 (INFO): Hits left @4: 0.8857421875
2018-03-20 22:07:20.610179 (INFO): Hits right @4: 0.98046875
2018-03-20 22:07:20.610300 (INFO): Hits @4: 0.93310546875
2018-03-20 22:07:20.610388 (INFO): Hits left @5: 0.91796875
2018-03-20 22:07:20.610475 (INFO): Hits right @5: 0.984375
2018-03-20 22:07:20.610590 (INFO): Hits @5: 0.951171875
2018-03-20 22:07:20.610684 (INFO): Hits left @6: 0.9345703125
2018-03-20 22:07:20.610789 (INFO): Hits right @6: 0.9853515625
2018-03-20 22:07:20.611052 (INFO): Hits @6: 0.9599609375
2018-03-20 22:07:20.611144 (INFO): Hits left @7: 0.94921875
2018-03-20 22:07:20.611240 (INFO): Hits right @7: 0.986328125
2018-03-20 22:07:20.611352 (INFO): Hits @7: 0.9677734375
2018-03-20 22:07:20.611439 (INFO): Hits left @8: 0.955078125
2018-03-20 22:07:20.611532 (INFO): Hits right @8: 0.9873046875
2018-03-20 22:07:20.611646 (INFO): Hits @8: 0.97119140625
2018-03-20 22:07:20.611735 (INFO): Hits left @9: 0.9599609375
2018-03-20 22:07:20.611822 (INFO): Hits right @9: 0.98828125
2018-03-20 22:07:20.611934 (INFO): Hits @9: 0.97412109375
2018-03-20 22:07:20.612023 (INFO): Hits left @10: 0.9619140625
2018-03-20 22:07:20.612107 (INFO): Hits right @10: 0.9892578125
2018-03-20 22:07:20.612221 (INFO): Hits @10: 0.9755859375
2018-03-20 22:07:20.612390 (INFO): Mean rank left: 3.3447265625
2018-03-20 22:07:20.612560 (INFO): Mean rank right: 1.80078125
2018-03-20 22:07:20.612817 (INFO): Mean rank: 2.57275390625
2018-03-20 22:07:20.613108 (INFO): Mean reciprocal rank left: 0.6916574439675227
2018-03-20 22:07:20.613304 (INFO): Mean reciprocal rank right: 0.9255156893448624
2018-03-20 22:07:20.613590 (INFO): Mean reciprocal rank: 0.8085865666561926
2018-03-20 22:07:20.613777 (INFO):
2018-03-20 22:07:20.613995 (INFO): --------------------------------------------------
2018-03-20 22:07:20.614026 (INFO): test_evaluation
2018-03-20 22:07:20.614053 (INFO): --------------------------------------------------
2018-03-20 22:07:20.614076 (INFO):
2018-03-20 22:07:21.273249 (INFO): Hits left @1: 0.455078125
2018-03-20 22:07:21.273545 (INFO): Hits right @1: 0.28515625
2018-03-20 22:07:21.273684 (INFO): Hits @1: 0.3701171875
2018-03-20 22:07:21.273782 (INFO): Hits left @2: 0.5556640625
2018-03-20 22:07:21.273877 (INFO): Hits right @2: 0.3291015625
2018-03-20 22:07:21.273989 (INFO): Hits @2: 0.4423828125
2018-03-20 22:07:21.274285 (INFO): Hits left @3: 0.6201171875
2018-03-20 22:07:21.274382 (INFO): Hits right @3: 0.365234375
2018-03-20 22:07:21.274494 (INFO): Hits @3: 0.49267578125
2018-03-20 22:07:21.274578 (INFO): Hits left @4: 0.6552734375
2018-03-20 22:07:21.274665 (INFO): Hits right @4: 0.396484375
2018-03-20 22:07:21.274785 (INFO): Hits @4: 0.52587890625
2018-03-20 22:07:21.274871 (INFO): Hits left @5: 0.677734375
2018-03-20 22:07:21.274954 (INFO): Hits right @5: 0.42578125
2018-03-20 22:07:21.275065 (INFO): Hits @5: 0.5517578125
2018-03-20 22:07:21.275151 (INFO): Hits left @6: 0.6962890625
2018-03-20 22:07:21.275238 (INFO): Hits right @6: 0.4482421875
2018-03-20 22:07:21.275346 (INFO): Hits @6: 0.572265625
2018-03-20 22:07:21.275434 (INFO): Hits left @7: 0.7138671875
2018-03-20 22:07:21.275518 (INFO): Hits right @7: 0.4697265625
2018-03-20 22:07:21.275777 (INFO): Hits @7: 0.591796875
2018-03-20 22:07:21.276031 (INFO): Hits left @8: 0.7275390625
2018-03-20 22:07:21.276120 (INFO): Hits right @8: 0.48828125
2018-03-20 22:07:21.276227 (INFO): Hits @8: 0.60791015625
2018-03-20 22:07:21.276312 (INFO): Hits left @9: 0.7373046875
2018-03-20 22:07:21.276396 (INFO): Hits right @9: 0.505859375
2018-03-20 22:07:21.276505 (INFO): Hits @9: 0.62158203125
2018-03-20 22:07:21.276589 (INFO): Hits left @10: 0.7470703125
2018-03-20 22:07:21.276671 (INFO): Hits right @10: 0.5205078125
2018-03-20 22:07:21.276784 (INFO): Hits @10: 0.6337890625
2018-03-20 22:07:21.276922 (INFO): Mean rank left: 20.4443359375
2018-03-20 22:07:21.277065 (INFO): Mean rank right: 23.8134765625
2018-03-20 22:07:21.277268 (INFO): Mean rank: 22.12890625
2018-03-20 22:07:21.277877 (INFO): Mean reciprocal rank left: 0.55576270324504
2018-03-20 22:07:21.278258 (INFO): Mean reciprocal rank right: 0.36218544123701757
2018-03-20 22:07:21.278650 (INFO): Mean reciprocal rank: 0.4589740722410288
I do not retrain the embeddings. I think this is the most common setup. You can do it with retraining, and this will yield much better results, but in scientific terms, this procedure is more an art than a reproducible method. Thus I try to avoid it.
You are right that the splits are different. I will fix this for consistency. However, I cannot reproduce your results with a reasonable number of epochs. For both splits, I get approximately the same results. 1000 epochs will induce overfitting and this is what you might see in your case. It might be that the model found a niche to overfit in one case but not the other, in general, if you look at lower epochs you should find better results. Can you confirm this?
Here my results for the minerva split:
2018-03-24 11:28:20.495854 (INFO): ########################################
2018-03-24 11:28:20.496054 (INFO): COMPLETED EPOCH: 67
2018-03-24 11:28:20.496144 (INFO): train Loss: 0.14067 99% CI: (0.12871, 0.15264), n=14
2018-03-24 11:28:20.496257 (INFO): ########################################
2018-03-24 11:28:20.496330 (INFO):
2018-03-24 11:28:20.508022 (INFO):
2018-03-24 11:28:20.508154 (INFO): --------------------------------------------------
2018-03-24 11:28:20.508212 (INFO): dev_evaluation
2018-03-24 11:28:20.508269 (INFO): --------------------------------------------------
2018-03-24 11:28:20.508283 (INFO):
2018-03-24 11:28:20.956984 (INFO): Hits left @1: 0.6201171875
2018-03-24 11:28:20.957185 (INFO): Hits right @1: 0.5966796875
2018-03-24 11:28:20.957389 (INFO): Hits @1: 0.6083984375
2018-03-24 11:28:20.957501 (INFO): Hits left @2: 0.7900390625
2018-03-24 11:28:20.957602 (INFO): Hits right @2: 0.7744140625
2018-03-24 11:28:20.957731 (INFO): Hits @2: 0.7822265625
2018-03-24 11:28:20.957838 (INFO): Hits left @3: 0.85546875
2018-03-24 11:28:20.957940 (INFO): Hits right @3: 0.849609375
2018-03-24 11:28:20.958071 (INFO): Hits @3: 0.8525390625
2018-03-24 11:28:20.958169 (INFO): Hits left @4: 0.89453125
2018-03-24 11:28:20.958221 (INFO): Hits right @4: 0.88671875
2018-03-24 11:28:20.958326 (INFO): Hits @4: 0.890625
2018-03-24 11:28:20.958405 (INFO): Hits left @5: 0.9228515625
2018-03-24 11:28:20.958484 (INFO): Hits right @5: 0.908203125
2018-03-24 11:28:20.958596 (INFO): Hits @5: 0.91552734375
2018-03-24 11:28:20.958822 (INFO): Hits left @6: 0.9375
2018-03-24 11:28:20.959043 (INFO): Hits right @6: 0.92578125
2018-03-24 11:28:20.959231 (INFO): Hits @6: 0.931640625
2018-03-24 11:28:20.959383 (INFO): Hits left @7: 0.9501953125
2018-03-24 11:28:20.959532 (INFO): Hits right @7: 0.9443359375
2018-03-24 11:28:20.959753 (INFO): Hits @7: 0.947265625
2018-03-24 11:28:20.960136 (INFO): Hits left @8: 0.9541015625
2018-03-24 11:28:20.960277 (INFO): Hits right @8: 0.951171875
2018-03-24 11:28:20.960500 (INFO): Hits @8: 0.95263671875
2018-03-24 11:28:20.960641 (INFO): Hits left @9: 0.9619140625
2018-03-24 11:28:20.960780 (INFO): Hits right @9: 0.9560546875
2018-03-24 11:28:20.960990 (INFO): Hits @9: 0.958984375
2018-03-24 11:28:20.961150 (INFO): Hits left @10: 0.966796875
2018-03-24 11:28:20.961297 (INFO): Hits right @10: 0.96484375
2018-03-24 11:28:20.961488 (INFO): Hits @10: 0.9658203125
2018-03-24 11:28:20.961775 (INFO): Mean rank left: 2.6845703125
2018-03-24 11:28:20.962007 (INFO): Mean rank right: 2.7802734375
2018-03-24 11:28:20.962453 (INFO): Mean rank: 2.732421875
2018-03-24 11:28:20.962819 (INFO): Mean reciprocal rank left: 0.7501948574057229
2018-03-24 11:28:20.963109 (INFO): Mean reciprocal rank right: 0.7340626532217147
2018-03-24 11:28:20.963753 (INFO): Mean reciprocal rank: 0.7421287553137188
2018-03-24 11:28:20.964501 (INFO):
2018-03-24 11:28:20.964572 (INFO): --------------------------------------------------
2018-03-24 11:28:20.964788 (INFO): test_evaluation
2018-03-24 11:28:20.964823 (INFO): --------------------------------------------------
2018-03-24 11:28:20.964841 (INFO):
2018-03-24 11:28:21.457610 (INFO): Hits left @1: 0.6396484375
2018-03-24 11:28:21.457849 (INFO): Hits right @1: 0.625
2018-03-24 11:28:21.457946 (INFO): Hits @1: 0.63232421875
2018-03-24 11:28:21.458014 (INFO): Hits left @2: 0.8017578125
2018-03-24 11:28:21.458185 (INFO): Hits right @2: 0.7939453125
2018-03-24 11:28:21.458279 (INFO): Hits @2: 0.7978515625
2018-03-24 11:28:21.458422 (INFO): Hits left @3: 0.8681640625
2018-03-24 11:28:21.458488 (INFO): Hits right @3: 0.85546875
2018-03-24 11:28:21.458659 (INFO): Hits @3: 0.86181640625
2018-03-24 11:28:21.458745 (INFO): Hits left @4: 0.9091796875
2018-03-24 11:28:21.458871 (INFO): Hits right @4: 0.9052734375
2018-03-24 11:28:21.458962 (INFO): Hits @4: 0.9072265625
2018-03-24 11:28:21.459166 (INFO): Hits left @5: 0.931640625
2018-03-24 11:28:21.459320 (INFO): Hits right @5: 0.9267578125
2018-03-24 11:28:21.459450 (INFO): Hits @5: 0.92919921875
2018-03-24 11:28:21.459647 (INFO): Hits left @6: 0.9443359375
2018-03-24 11:28:21.459794 (INFO): Hits right @6: 0.9443359375
2018-03-24 11:28:21.459957 (INFO): Hits @6: 0.9443359375
2018-03-24 11:28:21.460100 (INFO): Hits left @7: 0.9521484375
2018-03-24 11:28:21.460223 (INFO): Hits right @7: 0.9560546875
2018-03-24 11:28:21.460315 (INFO): Hits @7: 0.9541015625
2018-03-24 11:28:21.460499 (INFO): Hits left @8: 0.9599609375
2018-03-24 11:28:21.460628 (INFO): Hits right @8: 0.96484375
2018-03-24 11:28:21.460789 (INFO): Hits @8: 0.96240234375
2018-03-24 11:28:21.460915 (INFO): Hits left @9: 0.96484375
2018-03-24 11:28:21.460980 (INFO): Hits right @9: 0.96875
2018-03-24 11:28:21.461190 (INFO): Hits @9: 0.966796875
2018-03-24 11:28:21.461311 (INFO): Hits left @10: 0.9697265625
2018-03-24 11:28:21.461377 (INFO): Hits right @10: 0.974609375
2018-03-24 11:28:21.461467 (INFO): Hits @10: 0.97216796875
2018-03-24 11:28:21.461644 (INFO): Mean rank left: 2.4775390625
2018-03-24 11:28:21.461820 (INFO): Mean rank right: 2.4873046875
2018-03-24 11:28:21.462059 (INFO): Mean rank: 2.482421875
2018-03-24 11:28:21.462183 (INFO): Mean reciprocal rank left: 0.7646698364861386
2018-03-24 11:28:21.462303 (INFO): Mean reciprocal rank right: 0.7549327563519129
2018-03-24 11:28:21.462611 (INFO): Mean reciprocal rank: 0.7598012964190257
I could just reproduce your results without regularization for one split. But I would assume that you get similar results with the other split. If you do not use any regularization the results quickly oscillate between ~90% and ~50% Hits@10. I think that will be your issue.
However, I will still update the datasets for consistency and report new results for Kinship, UMLS, and Nations in this repo. Thank you for pointing this out!
Thanks, Tim.
I got the 45.9 MRR by running the freshly cloned repo and I observed it reached ~45 at around 60th epoch and then just has minor oscillations until epoch 1000. I saw you're using embedding dropout. There seems to be an L2 hyperparameter (Config.L2), which is set to 0. Do I need to set additional regularization hyperparameters? Similarly, when I got the ~85 MRR, I observed the performance reached that point at 60ish and stayed at that level.
Regarding the data split, I was not sure about whether you or the MINERVA paper is using the standard split, or in fact, there is no standard split at all. But I saw you updated the data and I will check it out.
I used the following parameters: input_drop 0.3 hidden_drop 0.0 feat_drop 0.0 lr 0.003 lr_decay 0.995
.
Can you replicate the results using this?
Also note, that I ran an inverse model this morning on kinship and got 45%, which means that kinship is probably quite biased and is not a dataset that you want to work with. I will add a comment for that in the repo later tomorrow.
Two clarification questions:
Do you retrain the embeddings with valid set triples added for test set prediction? An alternative is to train the embeddings using only train set triples and obtain the results for both dev and test using the same set of embeddings. Looking at the code I think you're doing the latter, just to make sure.
For dataset such as KINSHIP, do you random split the triples into train, dev, test according to a certain ratio or is there an official data split used by all papers? I'm asking because another KBC paper also released the KINSHIP dataset and the data split is different from yours.
Thanks!