TimDettmers / ConvE

Convolutional 2D Knowledge Graph Embeddings resources
MIT License
675 stars 163 forks source link

About Label Smoothing #55

Open helenxu opened 5 years ago

helenxu commented 5 years ago

I read from line 141 in main.py about label smoothing:

e2_multi = ((1.0-args.label_smoothing)*e2_multi) + (1.0/e2_multi.size(1))

Isn't it should be the following instead?

e2_multi = ((1.0-args.label_smoothing)*e2_multi) + (args.label_smoothing/e2_multi.size(1))
TimDettmers commented 4 years ago

Yes, you are correct — thank you for reporting this! I will need to study if the results change only slightly or significantly. If the difference is only slightly I will introduce it directly into the codebase. If the difference is significant, I will need to build some workaround.

TimDettmers commented 4 years ago

With the fixed label smoothing, one needs much higher label smoothing values to get a good score. Here the results of my grid search. The mean metric value is the Mean Reciprocal Rank (MRR).

WN18RR Bugged

Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.2'), ('lr', '0.003')):
Metric mean value (SE): 0.424 (0.0003). 95% CI (0.424, 0.425). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.1'), ('lr', '0.003')):
Metric mean value (SE): 0.424 (0.0002). 95% CI (0.424, 0.425). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.4'), ('lr', '0.003')):
Metric mean value (SE): 0.425 (0.0004). 95% CI (0.424, 0.426). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.5'), ('lr', '0.003')):
Metric mean value (SE): 0.425 (0.0005). 95% CI (0.424, 0.425). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.3'), ('lr', '0.003')):
Metric mean value (SE): 0.425 (0.0011). 95% CI (0.423, 0.428). Sample size: 2

WN18RR Fixed

Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.2'), ('lr', '0.003')):
Metric mean value (SE): 0.421 (0.0009). 95% CI (0.419, 0.423). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.5'), ('lr', '0.003')):
Metric mean value (SE): 0.422 (0.0011). 95% CI (0.420, 0.425). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.3'), ('lr', '0.003')):
Metric mean value (SE): 0.421 (0.0002). 95% CI (0.421, 0.422). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.4'), ('lr', '0.003')):
Metric mean value (SE): 0.423 (0.0006). 95% CI (0.422, 0.424). Sample size: 2
================================================================================
Summary for config (('data', 'WN18RR'), ('epochs', '150'), ('label_smoothing', '0.1'), ('lr', '0.003')):
Metric mean value (SE): 0.417 (0.0012). 95% CI (0.415, 0.420). Sample size: 2

FB15k-237 Bugged

Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.3'), ('lr', '0.001')):
Metric mean value (SE): 0.325 (0.0006). 95% CI (0.324, 0.326). Sample size: 2
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.5'), ('lr', '0.001')):
Metric mean value (SE): 0.322 (0.0000). 95% CI (0.322, 0.322). Sample size: 2
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.4'), ('lr', '0.001')):
Metric mean value (SE): 0.323 (0.0009). 95% CI (0.321, 0.325). Sample size: 2
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.1'), ('lr', '0.001')):
Metric mean value (SE): 0.324 (0.0016). 95% CI (0.321, 0.327). Sample size: 2
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.2'), ('lr', '0.001')):
Metric mean value (SE): 0.324 (0.0002). 95% CI (0.323, 0.324). Sample size: 2

FB15k-237 Fixed

Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.5'), ('lr', '0.001')):
Metric mean value (SE): 0.325 (0.0004). 95% CI (0.324, 0.325). Sample size: 2
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.3'), ('lr', '0.001')):
Metric mean value (SE): 0.321 (0.0003). 95% CI (0.321, 0.322). Sample size: 2
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.1'), ('lr', '0.001')):
Metric mean value (SE): 0.319 (nan). 95% CI (nan, nan). Sample size: 1
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.2'), ('lr', '0.001')):
Metric mean value (SE): 0.319 (0.0011). 95% CI (0.316, 0.321). Sample size: 2
================================================================================
Summary for config (('data', 'FB15k-237'), ('epochs', '150'), ('label_smoothing', '0.4'), ('lr', '0.001')):
Metric mean value (SE): 0.320 (0.0005). 95% CI (0.319, 0.321). Sample size: 2

As such it would would distort the results if I just change this blindly. I will think about a solution. Probably I will make an extra parameter where one can run with correct label smoothing.

lvermue commented 4 years ago

I think that the correct way should be e2_multi = e2_multi * (1-Config.label_smoothing_epsilon) + (1 - e2_multi) * (Config.label_smoothing_epsilon / (e2_multi.size(1) - 1)) citing http://www.deeplearningbook.org/contents/regularization.html, Chapter 7.5.1.

nxznm commented 3 years ago

Hi @lvermue , I think the equation still has some problem. As each position in e2_multi whose label is 1, they will become 1*(1-Config.label_smoothing_epsilon), and position in e2_multi whose label is 0, will become Config.label_smoothing_epsilon / (e2_multi.size(1) - 1). However, there is more than one position in e2_multi whose label is 1, so the sum of the distribution is more than 1, while the correct sum of the distribution should be 1. Do you have some idea?

I think that the correct way should be e2_multi = e2_multi * (1-Config.label_smoothing_epsilon) + (1 - e2_multi) * (Config.label_smoothing_epsilon / (e2_multi.size(1) - 1)) citing http://www.deeplearningbook.org/contents/regularization.html, Chapter 7.5.1.