loss or NME computing for convergence

epoc88 commented 3 years ago

I have tried both with "pretrained_model/mobileNetV2_0.25.pth" and without this pretrained model for retraining the model for common, challenge and fullset.

However, losses never goes down to below 30, neither from pretrained or retrained from initial-scratch. On the other hand, I tried to removed the training part of the code from train_model.py see this https://github.com/epoc88/PFLD_68pts_Pytorch/blob/master/test_NME.py

I did not add any pretrained model, but I still got some NME result..

Test epochs: 3 Loss 0.094 mean error and failure rate mean error : 0.061 failure rate: L1 0.161

So I am a bit confused...

The issue could be related to optimizer and gradient decent., or loss function WingLoss is good for fine tuning, but PFLF's MSE could be good in the beginning.

Here is a Chinese paper discussing the performance of PFLD on 300W, especially the loss function https://jishuin.proginn.com/p/763bfbd29621

One more thing, it is worth to mention that the default code for training is "common subset" for validation during training (i.e. without ibug images). There are common, challenge, fullset as validation subset.

我已经尝试了使用“pretrained_model/mobileNetV2_0.25.pth”和没有此预训练模型的重新训练模型，以进行普通、挑战和全集的重新训练。

然而，损失从不低于30 ，无论是从最初的训练或再训练。另一方面，我试图从train_model.py中删除代码的训练部分看看这个 https://github.com/epoc88/PFLD_68pts_Pytorch/blob/master/test_NME.py

我没有添加任何预先训练的模型，但我仍然得到了一些NME的结果。

测试阶段：3损耗0.094 平均误差和故障率平均误差： 0.061 失效率：L1 0.161

所以我有点困惑...

这个问题可能与损耗函数和梯度下降法有关。**

下文是一篇关于300W的PFLD性能，特别是损耗函数的中文论文。 https://jishuin.proginn.com/p/763bfbd29621

这是我用您的代码修改后训练的MobileNetV2_0.25模型来测算的结果

论文结果	Speed on CPU (1.25ms)	Model Size (2MB)	NME (ION) Common (3.03)	NME (ION) Challenge (5.15 )	NME (ION) Fullset (3.45)	AUC (0.80)
我们代码的结果,	206	1.1	4.74	7.95	5.25	0.4889 (fullset)

主要差别，论文速度1.25ms, 我们的是200ms, MobileV2_0.25 的尺寸也不一样，论文是2.1MB，我们pretrained_model 是1.1MB, 这个不是官方的pretrained model?

还有一点，值得一提的是，训练的默认代码是训练期间验证的“公共子集”（即。没有ibug图片）。有共同的，挑战的，全集作为验证子集。

github-luffy commented 3 years ago

I have tried both with "pretrained_model/mobileNetV2_0.25.pth" and without this pretrained model for retraining the model for common, challenge and fullset.

However, losses never goes down to below 30, neither from pretrained or retrained from initial-scratch. On the other hand, I tried to removed the training part of the code from train_model.py see this https://github.com/epoc88/PFLD_68pts_Pytorch/blob/master/test_NME.py

I did not add any pretrained model, but I still got some NME result..

Test epochs: 3 Loss 0.094 mean error and failure rate mean error : 0.061 failure rate: L1 0.161

So I am a bit confused...

The issue could be related to optimizer and gradient decent., or loss function WingLoss is good for fine tuning, but PFLF's MSE could be good in the beginning.

Here is a Chinese paper discussing the performance of PFLD on 300W, especially the loss function https://jishuin.proginn.com/p/763bfbd29621

One more thing, it is worth to mention that the default code for training is "common subset" for validation during training (i.e. without ibug images). There are common, challenge, fullset as validation subset.

我已经尝试了使用“pretrained_model/mobileNetV2_0.25.pth”和没有此预训练模型的重新训练模型，以进行普通、挑战和全集的重新训练。

然而，损失从不低于30 ，无论是从最初的训练或再训练。另一方面，我试图从train_model.py中删除代码的训练部分看看这个 https://github.com/epoc88/PFLD_68pts_Pytorch/blob/master/test_NME.py

我没有添加任何预先训练的模型，但我仍然得到了一些NME的结果。

测试阶段：3损耗0.094 平均误差和故障率平均误差： 0.061 失效率：L1 0.161

所以我有点困惑...

这个问题可能与_optimizer_和梯度下降法有关。

下文是一篇关于300W的PFLD性能，特别是损耗函数的中文论文。 https://jishuin.proginn.com/p/763bfbd29621

这是我用您的代码修改后训练的MobileNetV2_0.25模型来测算的结果

论文结果 Speed on CPU (1.25ms) Model Size (2MB) NME (ION) Common (3.03) NME (ION) Challenge (5.15 ) NME (ION) Fullset (3.45) AUC (0.80) 我们代码的结果, 206 1.1 4.74 7.95 5.25 0.4889 (fullset) 主要差别，论文速度1.25ms, 我们的是200ms, MobileV2_0.25 的尺寸也不一样，论文是2.1MB，我们pretrained_model 是1.1MB, 这个不是官方的pretrained model?

还有一点，值得一提的是，训练的默认代码是训练期间验证的“公共子集”（即。没有ibug图片）。有共同的，挑战的，全集作为验证子集。

感谢做了好多工作，简单回答以上几个问题： 1.损失从不低于30 ？可能是损失函数的设计，你可以修改或更换下损失跑跑看 2.没有添加任何预先训练的模型，得到了一些NME的结果？没设置模型，但是代码给了模型默认的路径，你确保这个有没有? 3.论文结果？可能跟论文结果有较多出入，MobileNetV2_0.25模型有些不同，你可以根据论文的参数设置网络，然后跑跑看

epoc88 commented 3 years ago

对不起，我的评论写的是英文，中文部分基本是翻译的，可能说的不清楚。

用过pretrained_model，但是loss 依然在30 以上。而且跳动比较大，不是单线下降的。改过learning rate 也没有效果

github-luffy commented 3 years ago

提的几个问题都是很好的 1.损失从不低于30 ？损失函数是衡量模型预测的好坏程度，是一开始就是30还是损失下降到30然后基本不变（说明收敛），你可以改下损失函数试试，用MSE损失试试呢。 2.没有添加任何预先训练的模型，得到了一些NME的结果？待验证此问题。 3.论文结果？主要是参考了论文的算法思想，论文的实验结果没有重点去复现，有空去对比看看。

huangzhenjie commented 3 years ago

@github-luffy 我直接跑demo，loss也是下降不低于30，不知是不是正常情况，这是一个batch累计的loss？

huangzhenjie commented 3 years ago

@epoc88 效果怎样吗？

github-luffy / PFLD_68points_Pytorch

loss or NME computing for convergence #26