Open RobinChiu opened 11 months ago
base on the document I found the commit should be rollback. commit After roll it back, the training result is correct.
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/functional/cross_entropy_cn.html#cross-entropy label (Tensor) – 输入 input 对应的标签值。若 soft_label=False,要求 label 维度为 [N1,N2,...,Nk] 或 [N1,N2,...,Nk,1] ,数据类型为'int32', 'int64', 'float32', 'float64',且值必须大于等于 0 且小于 C;若 soft_label=True,要求 label 的维度、数据类型与 input 相同,且每个样本各软标签的总和为 1。
感谢提出问题,建议使用最新版本哈~
欢迎您使用PaddleClas并反馈相关问题,非常感谢您对PaddleClas的贡献! 提出issue时,辛苦您提供以下信息,方便我们快速定位问题并及时有效地解决您的问题:
PaddleClas release/2.3 PaddlePaddle 2.3.0
No
Docker image: paddlecloud/paddleclas:2.3-gpu-cuda11.2-cudnn8-latest
No
follow the docs/zh_CN/quick_start/quick_start_classification_professional.md train the distill model but the top1 only 0.01 python3 -m paddle.distributed.launch \ --gpus="0" \ tools/train.py \ -c ./ppcls/configs/quick_start/professional/R50_vd_distill_MV3_large_x1_0_CIFAR100.yaml \ -o Global.output_dir="output_CIFAR"
Try to change the Optimizer.lr.learning_rate=0.01, but the same result.
The latest train log
[2023/11/29 10:06:05] ppcls INFO: [Train][Epoch 100/100][Avg]top1: 0.01000, top5: 0.03776, CELoss_Student_Teacher: nan, loss: nan [2023/11/29 10:06:06] ppcls INFO: [Eval][Epoch 100][Iter: 0/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.35142s, reader_cost: 0.29173, ips: 182.11954 images/sec [2023/11/29 10:06:06] ppcls INFO: [Eval][Epoch 100][Iter: 10/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04212s, reader_cost: 0.00012, ips: 1519.46605 images/sec [2023/11/29 10:06:06] ppcls INFO: [Eval][Epoch 100][Iter: 20/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04256s, reader_cost: 0.00011, ips: 1503.58581 images/sec [2023/11/29 10:06:07] ppcls INFO: [Eval][Epoch 100][Iter: 30/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04251s, reader_cost: 0.00011, ips: 1505.49286 images/sec [2023/11/29 10:06:07] ppcls INFO: [Eval][Epoch 100][Iter: 40/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04235s, reader_cost: 0.00011, ips: 1511.16876 images/sec [2023/11/29 10:06:08] ppcls INFO: [Eval][Epoch 100][Iter: 50/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04373s, reader_cost: 0.00012, ips: 1463.68354 images/sec [2023/11/29 10:06:08] ppcls INFO: [Eval][Epoch 100][Iter: 60/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04337s, reader_cost: 0.00011, ips: 1475.82858 images/sec [2023/11/29 10:06:09] ppcls INFO: [Eval][Epoch 100][Iter: 70/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04312s, reader_cost: 0.00011, ips: 1484.06229 images/sec [2023/11/29 10:06:09] ppcls INFO: [Eval][Epoch 100][Iter: 80/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04318s, reader_cost: 0.00011, ips: 1482.25883 images/sec [2023/11/29 10:06:09] ppcls INFO: [Eval][Epoch 100][Iter: 90/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04329s, reader_cost: 0.00011, ips: 1478.36561 images/sec [2023/11/29 10:06:10] ppcls INFO: [Eval][Epoch 100][Iter: 100/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04332s, reader_cost: 0.00011, ips: 1477.29185 images/sec [2023/11/29 10:06:10] ppcls INFO: [Eval][Epoch 100][Iter: 110/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04345s, reader_cost: 0.00011, ips: 1472.94545 images/sec [2023/11/29 10:06:11] ppcls INFO: [Eval][Epoch 100][Iter: 120/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.01562, batch_cost: 0.04338s, reader_cost: 0.00011, ips: 1475.45486 images/sec [2023/11/29 10:06:11] ppcls INFO: [Eval][Epoch 100][Iter: 130/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04326s, reader_cost: 0.00011, ips: 1479.37751 images/sec [2023/11/29 10:06:12] ppcls INFO: [Eval][Epoch 100][Iter: 140/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04316s, reader_cost: 0.00012, ips: 1482.68761 images/sec [2023/11/29 10:06:12] ppcls INFO: [Eval][Epoch 100][Iter: 150/157]CELoss_Student: nan, loss: nan, top1: 0.00000, top5: 0.00000, batch_cost: 0.04314s, reader_cost: 0.00011, ips: 1483.66894 images/sec [2023/11/29 10:06:12] ppcls INFO: [Eval][Epoch 100][Avg]CELoss_Student: nan, loss: nan, top1: 0.01000, top5: 0.03890 [2023/11/29 10:06:12] ppcls INFO: [Eval][Epoch 100][best metric: 0.01] [2023/11/29 10:06:13] ppcls INFO: Already save model in output_CIFAR/DistillationModel/epoch_100 [2023/11/29 10:06:14] ppcls INFO: Already save model in output_CIFAR/DistillationModel/latest