Closed jackie8310 closed 2 years ago
嗨,您好。注意到您使用的batch-size是2,但是使用的是TripletLoss,这样是不合理的,batch-size最少应该是4。另外,如果使用较小的batch-size,不建议使用TripletLoss,可以直接使用Arcmargin哈~
Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于该问题超过三个月未更新,将会被关闭,若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。
训练环境信息: a. 具体操作系统,Linux/Windows/ b. Python版本号,Python3.7 c. CUDA/cuDNN版本, CUDA10.2/cuDNN 7.6.5 4. 问题1:win10下训练出现下面错误,配置文件信息如下: [2021/09/01 20:44:31] root INFO:
== PaddleClas is powered by PaddlePaddle ! ==
== == == For more info please go to the following website. == == == == https://github.com/PaddlePaddle/PaddleClas ==
[2021/09/01 20:44:31] root INFO: Arch : [2021/09/01 20:44:31] root INFO: Backbone : [2021/09/01 20:44:31] root INFO: name : MobileNetV1 [2021/09/01 20:44:31] root INFO: pretrained : True [2021/09/01 20:44:31] root INFO: BackboneStopLayer : [2021/09/01 20:44:31] root INFO: name : flatten_0 [2021/09/01 20:44:31] root INFO: Head : [2021/09/01 20:44:31] root INFO: class_num : 101 [2021/09/01 20:44:31] root INFO: embedding_size : 512 [2021/09/01 20:44:31] root INFO: margin : 0.15 [2021/09/01 20:44:31] root INFO: name : ArcMargin [2021/09/01 20:44:31] root INFO: scale : 30 [2021/09/01 20:44:31] root INFO: Neck : [2021/09/01 20:44:31] root INFO: class_num : 512 [2021/09/01 20:44:31] root INFO: embedding_size : 1024 [2021/09/01 20:44:31] root INFO: name : FC [2021/09/01 20:44:31] root INFO: infer_add_softmax : False [2021/09/01 20:44:31] root INFO: infer_output_key : features [2021/09/01 20:44:31] root INFO: name : RecModel [2021/09/01 20:44:31] root INFO: DataLoader : [2021/09/01 20:44:31] root INFO: Eval : [2021/09/01 20:44:31] root INFO: Gallery : [2021/09/01 20:44:31] root INFO: dataset : [2021/09/01 20:44:31] root INFO: cls_label_path : ./dataset/imagenet_dataset/val_list.txt [2021/09/01 20:44:31] root INFO: image_root : ./dataset/imagenet_dataset/ [2021/09/01 20:44:31] root INFO: name : ImageNetDataset [2021/09/01 20:44:31] root INFO: transform_ops : [2021/09/01 20:44:31] root INFO: DecodeImage : [2021/09/01 20:44:31] root INFO: channel_first : False [2021/09/01 20:44:31] root INFO: to_rgb : True [2021/09/01 20:44:31] root INFO: ResizeImage : [2021/09/01 20:44:31] root INFO: size : 224 [2021/09/01 20:44:31] root INFO: NormalizeImage : [2021/09/01 20:44:31] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:44:31] root INFO: order : [2021/09/01 20:44:31] root INFO: scale : 1.0/255.0 [2021/09/01 20:44:31] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:44:31] root INFO: loader : [2021/09/01 20:44:31] root INFO: num_workers : 0 [2021/09/01 20:44:31] root INFO: use_shared_memory : True [2021/09/01 20:44:31] root INFO: sampler : [2021/09/01 20:44:31] root INFO: batch_size : 2 [2021/09/01 20:44:31] root INFO: drop_last : False [2021/09/01 20:44:31] root INFO: name : DistributedBatchSampler [2021/09/01 20:44:31] root INFO: shuffle : False [2021/09/01 20:44:31] root INFO: Query : [2021/09/01 20:44:31] root INFO: dataset : [2021/09/01 20:44:31] root INFO: cls_label_path : ./dataset/imagenet_dataset/val_list.txt [2021/09/01 20:44:31] root INFO: image_root : ./dataset/imagenet_dataset/ [2021/09/01 20:44:31] root INFO: name : ImageNetDataset [2021/09/01 20:44:31] root INFO: transform_ops : [2021/09/01 20:44:31] root INFO: DecodeImage : [2021/09/01 20:44:31] root INFO: channel_first : False [2021/09/01 20:44:31] root INFO: to_rgb : True [2021/09/01 20:44:31] root INFO: ResizeImage : [2021/09/01 20:44:31] root INFO: size : 224 [2021/09/01 20:44:31] root INFO: NormalizeImage : [2021/09/01 20:44:31] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:44:31] root INFO: order : [2021/09/01 20:44:31] root INFO: scale : 0.00392157 [2021/09/01 20:44:31] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:44:31] root INFO: loader : [2021/09/01 20:44:31] root INFO: num_workers : 0 [2021/09/01 20:44:31] root INFO: use_shared_memory : True [2021/09/01 20:44:31] root INFO: sampler : [2021/09/01 20:44:31] root INFO: batch_size : 2 [2021/09/01 20:44:31] root INFO: drop_last : False [2021/09/01 20:44:31] root INFO: name : DistributedBatchSampler [2021/09/01 20:44:31] root INFO: shuffle : False [2021/09/01 20:44:31] root INFO: Train : [2021/09/01 20:44:31] root INFO: dataset : [2021/09/01 20:44:31] root INFO: cls_label_path : ./dataset/imagenet_dataset/train_list.txt [2021/09/01 20:44:31] root INFO: image_root : ./dataset/imagenet_dataset/ [2021/09/01 20:44:31] root INFO: name : ImageNetDataset [2021/09/01 20:44:31] root INFO: transform_ops : [2021/09/01 20:44:31] root INFO: DecodeImage : [2021/09/01 20:44:31] root INFO: channel_first : False [2021/09/01 20:44:31] root INFO: to_rgb : True [2021/09/01 20:44:31] root INFO: ResizeImage : [2021/09/01 20:44:31] root INFO: size : 224 [2021/09/01 20:44:31] root INFO: RandFlipImage : [2021/09/01 20:44:31] root INFO: flip_code : 1 [2021/09/01 20:44:31] root INFO: NormalizeImage : [2021/09/01 20:44:31] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:44:31] root INFO: order : [2021/09/01 20:44:31] root INFO: scale : 0.00392157 [2021/09/01 20:44:31] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:44:31] root INFO: loader : [2021/09/01 20:44:31] root INFO: num_workers : 0 [2021/09/01 20:44:31] root INFO: use_shared_memory : True [2021/09/01 20:44:31] root INFO: sampler : [2021/09/01 20:44:31] root INFO: batch_size : 2 [2021/09/01 20:44:31] root INFO: drop_last : False [2021/09/01 20:44:31] root INFO: name : DistributedRandomIdentitySampler [2021/09/01 20:44:31] root INFO: num_instances : 2 [2021/09/01 20:44:31] root INFO: shuffle : True [2021/09/01 20:44:31] root INFO: Global : [2021/09/01 20:44:31] root INFO: checkpoints : None [2021/09/01 20:44:31] root INFO: device : gpu [2021/09/01 20:44:31] root INFO: epochs : 100 [2021/09/01 20:44:31] root INFO: eval_during_train : True [2021/09/01 20:44:31] root INFO: eval_interval : 10 [2021/09/01 20:44:31] root INFO: eval_mode : retrieval [2021/09/01 20:44:31] root INFO: image_shape : [3, 224, 224] [2021/09/01 20:44:31] root INFO: output_dir : ./output/MobileNetV1/ [2021/09/01 20:44:31] root INFO: pretrained_model : None [2021/09/01 20:44:31] root INFO: print_batch_step : 100 [2021/09/01 20:44:31] root INFO: save_inference_dir : ./inference [2021/09/01 20:44:31] root INFO: save_interval : 10 [2021/09/01 20:44:31] root INFO: use_visualdl : False [2021/09/01 20:44:31] root INFO: Loss : [2021/09/01 20:44:31] root INFO: Eval : [2021/09/01 20:44:31] root INFO: CELoss : [2021/09/01 20:44:31] root INFO: weight : 1.0 [2021/09/01 20:44:31] root INFO: Train : [2021/09/01 20:44:31] root INFO: CELoss : [2021/09/01 20:44:31] root INFO: weight : 1.0 [2021/09/01 20:44:31] root INFO: TripletLossV2 : [2021/09/01 20:44:31] root INFO: margin : 0.5 [2021/09/01 20:44:31] root INFO: weight : 1.0 [2021/09/01 20:44:31] root INFO: Metric : [2021/09/01 20:44:31] root INFO: Eval : [2021/09/01 20:44:31] root INFO: Recallk : [2021/09/01 20:44:31] root INFO: topk : [1, 5] [2021/09/01 20:44:31] root INFO: mAP : [2021/09/01 20:44:31] root INFO: Train : [2021/09/01 20:44:31] root INFO: TopkAcc : [2021/09/01 20:44:31] root INFO: topk : [1, 5] [2021/09/01 20:44:31] root INFO: Optimizer : [2021/09/01 20:44:31] root INFO: lr : [2021/09/01 20:44:31] root INFO: gamma : 0.5 [2021/09/01 20:44:31] root INFO: last_epoch : -1 [2021/09/01 20:44:31] root INFO: learning_rate : 0.000625 [2021/09/01 20:44:31] root INFO: milestones : [40, 60, 80] [2021/09/01 20:44:31] root INFO: name : MultiStepDecay [2021/09/01 20:44:31] root INFO: verbose : False [2021/09/01 20:44:31] root INFO: momentum : 0.9 [2021/09/01 20:44:31] root INFO: name : Momentum [2021/09/01 20:44:31] root INFO: regularizer : [2021/09/01 20:44:31] root INFO: coeff : 0.0005 [2021/09/01 20:44:31] root INFO: name : L2 W0901 20:44:31.899075 672 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.1, Runtime API Version: 10.2 W0901 20:44:31.908082 672 device_context.cc:422] device: 0, cuDNN Version: 7.6. [2021/09/01 20:44:34] root INFO: unique_endpoints {''} [2021/09/01 20:44:34] root INFO: Found C:\Users\22928/.paddleclas/weights\MobileNetV1_pretrained.pdparams [2021/09/01 20:44:34] root INFO: train with paddle 2.1.1 and device CUDAPlace(0) {'CELoss': {'weight': 1.0}} {'TripletLossV2': {'weight': 1.0, 'margin': 0.5}} Traceback (most recent call last): File "tools/train.py", line 31, in
trainer.train()
File "E:\PaddleClas2.2\ppcls\engine\trainer.py", line 174, in train
loss_dict = self.train_loss_func(out, batch[1])
File "E:\PaddleClas2.2\ppcls\loss__init.py", line 46, in call__
loss = loss_func(input, batch)
File "D:\lib\site-packages\paddle\fluid\dygraph\layers.py", line 902, in call
outputs = self.forward(*inputs, kwargs)
File "E:\PaddleClas2.2\ppcls\loss\triplet.py", line 66, in forward
paddle.masked_select(dist, isneg), (bs, -1)),
File "D:\lib\site-packages\paddle\tensor\manipulation.py", line 1575, in reshape
return paddle.fluid.layers.reshape(x=x, shape=shape, name=name)
File "D:\lib\site-packages\paddle\fluid\layers\nn.py", line 6142, in reshape
out, = core.ops.reshape2(x, None, 'shape', shape)
RuntimeError: (PreconditionNotMet) The Tensor's element number must be equal or greater than zero. The Tensor's shape is [2, -1] now
[Hint: Expected numel() >= 0, but received numel():-2 < 0:0.] (at C:\home\workspace\Paddle_release2\paddle\fluid\framework\tensor.cc:59)
[operator < reshape2 > error]**
问题2:aistudio下训练没问题,但训练过程中无topk : [1, 5]输出: 2021/09/01 20:59:07] root INFO: Arch : [2021/09/01 20:59:07] root INFO: Backbone : [2021/09/01 20:59:07] root INFO: name : MobileNetV1 [2021/09/01 20:59:07] root INFO: pretrained : True [2021/09/01 20:59:07] root INFO: BackboneStopLayer : [2021/09/01 20:59:07] root INFO: name : flatten_0 [2021/09/01 20:59:07] root INFO: Head : [2021/09/01 20:59:07] root INFO: class_num : 101 [2021/09/01 20:59:07] root INFO: embedding_size : 512 [2021/09/01 20:59:07] root INFO: margin : 0.15 [2021/09/01 20:59:07] root INFO: name : ArcMargin [2021/09/01 20:59:07] root INFO: scale : 30 [2021/09/01 20:59:07] root INFO: Neck : [2021/09/01 20:59:07] root INFO: class_num : 512 [2021/09/01 20:59:07] root INFO: embedding_size : 1024 [2021/09/01 20:59:07] root INFO: name : FC [2021/09/01 20:59:07] root INFO: infer_add_softmax : False [2021/09/01 20:59:07] root INFO: infer_output_key : features [2021/09/01 20:59:07] root INFO: name : RecModel [2021/09/01 20:59:07] root INFO: DataLoader : [2021/09/01 20:59:07] root INFO: Eval : [2021/09/01 20:59:07] root INFO: Gallery : [2021/09/01 20:59:07] root INFO: dataset : [2021/09/01 20:59:07] root INFO: cls_label_path : ./dataset/bbox_img/test_list.txt [2021/09/01 20:59:07] root INFO: image_root : ./dataset/bbox_img/ [2021/09/01 20:59:07] root INFO: name : ImageNetDataset [2021/09/01 20:59:07] root INFO: transform_ops : [2021/09/01 20:59:07] root INFO: DecodeImage : [2021/09/01 20:59:07] root INFO: channel_first : False [2021/09/01 20:59:07] root INFO: to_rgb : True [2021/09/01 20:59:07] root INFO: ResizeImage : [2021/09/01 20:59:07] root INFO: size : 224 [2021/09/01 20:59:07] root INFO: NormalizeImage : [2021/09/01 20:59:07] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:59:07] root INFO: order : [2021/09/01 20:59:07] root INFO: scale : 1.0/255.0 [2021/09/01 20:59:07] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:59:07] root INFO: loader : [2021/09/01 20:59:07] root INFO: num_workers : 2 [2021/09/01 20:59:07] root INFO: use_shared_memory : False [2021/09/01 20:59:07] root INFO: sampler : [2021/09/01 20:59:07] root INFO: batch_size : 64 [2021/09/01 20:59:07] root INFO: drop_last : False [2021/09/01 20:59:07] root INFO: name : DistributedBatchSampler [2021/09/01 20:59:07] root INFO: shuffle : False [2021/09/01 20:59:07] root INFO: Query : [2021/09/01 20:59:07] root INFO: dataset : [2021/09/01 20:59:07] root INFO: cls_label_path : ./dataset/bbox_img/val_list.txt [2021/09/01 20:59:07] root INFO: image_root : ./dataset/bbox_img/ [2021/09/01 20:59:07] root INFO: name : ImageNetDataset [2021/09/01 20:59:07] root INFO: transform_ops : [2021/09/01 20:59:07] root INFO: DecodeImage : [2021/09/01 20:59:07] root INFO: channel_first : False [2021/09/01 20:59:07] root INFO: to_rgb : True [2021/09/01 20:59:07] root INFO: ResizeImage : [2021/09/01 20:59:07] root INFO: size : 224 [2021/09/01 20:59:07] root INFO: NormalizeImage : [2021/09/01 20:59:07] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:59:07] root INFO: order : [2021/09/01 20:59:07] root INFO: scale : 0.00392157 [2021/09/01 20:59:07] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:59:07] root INFO: loader : [2021/09/01 20:59:07] root INFO: num_workers : 2 [2021/09/01 20:59:07] root INFO: use_shared_memory : False [2021/09/01 20:59:07] root INFO: sampler : [2021/09/01 20:59:07] root INFO: batch_size : 64 [2021/09/01 20:59:07] root INFO: drop_last : False [2021/09/01 20:59:07] root INFO: name : DistributedBatchSampler [2021/09/01 20:59:07] root INFO: shuffle : False [2021/09/01 20:59:07] root INFO: Train : [2021/09/01 20:59:07] root INFO: dataset : [2021/09/01 20:59:07] root INFO: cls_label_path : ./dataset/bbox_img/train_list.txt [2021/09/01 20:59:07] root INFO: image_root : ./dataset/bbox_img/ [2021/09/01 20:59:07] root INFO: name : ImageNetDataset [2021/09/01 20:59:07] root INFO: transform_ops : [2021/09/01 20:59:07] root INFO: DecodeImage : [2021/09/01 20:59:07] root INFO: channel_first : False [2021/09/01 20:59:07] root INFO: to_rgb : True [2021/09/01 20:59:07] root INFO: ResizeImage : [2021/09/01 20:59:07] root INFO: size : 224 [2021/09/01 20:59:07] root INFO: RandFlipImage : [2021/09/01 20:59:07] root INFO: flip_code : 1 [2021/09/01 20:59:07] root INFO: NormalizeImage : [2021/09/01 20:59:07] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:59:07] root INFO: order : [2021/09/01 20:59:07] root INFO: scale : 0.00392157 [2021/09/01 20:59:07] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:59:07] root INFO: loader : [2021/09/01 20:59:07] root INFO: num_workers : 2 [2021/09/01 20:59:07] root INFO: use_shared_memory : False [2021/09/01 20:59:07] root INFO: sampler : [2021/09/01 20:59:07] root INFO: batch_size : 64 [2021/09/01 20:59:07] root INFO: drop_last : False [2021/09/01 20:59:07] root INFO: name : DistributedRandomIdentitySampler [2021/09/01 20:59:07] root INFO: num_instances : 2 [2021/09/01 20:59:07] root INFO: shuffle : True [2021/09/01 20:59:07] root INFO: Global : [2021/09/01 20:59:07] root INFO: checkpoints : None [2021/09/01 20:59:07] root INFO: device : gpu [2021/09/01 20:59:07] root INFO: epochs : 100 [2021/09/01 20:59:07] root INFO: eval_during_train : True [2021/09/01 20:59:07] root INFO: eval_interval : 10 [2021/09/01 20:59:07] root INFO: eval_mode : retrieval [2021/09/01 20:59:07] root INFO: image_shape : [3, 224, 224] [2021/09/01 20:59:07] root INFO: output_dir : ./output/MobileNetV1/ [2021/09/01 20:59:07] root INFO: pretrained_model : None [2021/09/01 20:59:07] root INFO: print_batch_step : 100 [2021/09/01 20:59:07] root INFO: save_inference_dir : ./inference [2021/09/01 20:59:07] root INFO: save_interval : 10 [2021/09/01 20:59:07] root INFO: use_visualdl : False [2021/09/01 20:59:07] root INFO: Loss : [2021/09/01 20:59:07] root INFO: Eval : [2021/09/01 20:59:07] root INFO: CELoss : [2021/09/01 20:59:07] root INFO: weight : 1.0 [2021/09/01 20:59:07] root INFO: Train : [2021/09/01 20:59:07] root INFO: CELoss : [2021/09/01 20:59:07] root INFO: weight : 1.0 [2021/09/01 20:59:07] root INFO: TripletLossV2 : [2021/09/01 20:59:07] root INFO: margin : 0.5 [2021/09/01 20:59:07] root INFO: weight : 1.0 [2021/09/01 20:59:07] root INFO: Metric : [2021/09/01 20:59:07] root INFO: Eval : [2021/09/01 20:59:07] root INFO: Recallk : [2021/09/01 20:59:07] root INFO: topk : [1, 5] [2021/09/01 20:59:07] root INFO: mAP : [2021/09/01 20:59:07] root INFO: Train : [2021/09/01 20:59:07] root INFO: TopkAcc : [2021/09/01 20:59:07] root INFO: topk : [1, 5] [2021/09/01 20:59:07] root INFO: Optimizer : [2021/09/01 20:59:07] root INFO: lr : [2021/09/01 20:59:07] root INFO: gamma : 0.5 [2021/09/01 20:59:07] root INFO: last_epoch : -1 [2021/09/01 20:59:07] root INFO: learning_rate : 0.01 [2021/09/01 20:59:07] root INFO: milestones : [40, 60, 80] [2021/09/01 20:59:07] root INFO: name : MultiStepDecay [2021/09/01 20:59:07] root INFO: verbose : False [2021/09/01 20:59:07] root INFO: momentum : 0.9 [2021/09/01 20:59:07] root INFO: name : Momentum [2021/09/01 20:59:07] root INFO: regularizer : [2021/09/01 20:59:07] root INFO: coeff : 0.0005 [2021/09/01 20:59:07] root INFO: name : L2 W0901 20:59:07.876523 16854 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1 W0901 20:59:07.881187 16854 device_context.cc:422] device: 0, cuDNN Version: 7.6. [2021/09/01 20:59:12] root INFO: unique_endpoints {''} [2021/09/01 20:59:12] root INFO: Found /home/aistudio/.paddleclas/weights/MobileNetV1_pretrained.pdparams [2021/09/01 20:59:13] root INFO: train with paddle 2.1.2 and device CUDAPlace(0) {'CELoss': {'weight': 1.0}} {'TripletLossV2': {'weight': 1.0, 'margin': 0.5}} [2021/09/01 20:59:13] root INFO: [Train][Epoch 1/100][Avg] [2021/09/01 20:59:13] root INFO: [Train][Epoch 2/100][Avg] [2021/09/01 20:59:13] root INFO: [Train][Epoch 3/100][Avg] [2021/09/01 20:59:14] root INFO: [Train][Epoch 4/100][Avg] [2021/09/01 20:59:14] root INFO: [Train][Epoch 5/100][Avg] [2021/09/01 20:59:14] root INFO: [Train][Epoch 6/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 7/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 8/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 9/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 10/100][Avg]