PaddlePaddle / PaddleClas

A treasure chest for visual classification and recognition powered by PaddlePaddle
Apache License 2.0
5.48k stars 1.17k forks source link

图像检索度量学习模型训练问题: #1195

Closed jackie8310 closed 2 years ago

jackie8310 commented 3 years ago
  1. PaddleClas版本以及PaddlePaddle版本:PaddleClas release/2.2
  2. 涉及的其他产品使用的版本号:如您在使用PaddleClas的同时还在使用其他产品,如PaddleServing、PaddleInference等,请您提供其版本号
  3. 训练环境信息: a. 具体操作系统,Linux/Windows/ b. Python版本号,Python3.7 c. CUDA/cuDNN版本, CUDA10.2/cuDNN 7.6.5 4. 问题1:win10下训练出现下面错误,配置文件信息如下: [2021/09/01 20:44:31] root INFO:

    == PaddleClas is powered by PaddlePaddle ! ==

    == == == For more info please go to the following website. == == == == https://github.com/PaddlePaddle/PaddleClas ==

[2021/09/01 20:44:31] root INFO: Arch : [2021/09/01 20:44:31] root INFO: Backbone : [2021/09/01 20:44:31] root INFO: name : MobileNetV1 [2021/09/01 20:44:31] root INFO: pretrained : True [2021/09/01 20:44:31] root INFO: BackboneStopLayer : [2021/09/01 20:44:31] root INFO: name : flatten_0 [2021/09/01 20:44:31] root INFO: Head : [2021/09/01 20:44:31] root INFO: class_num : 101 [2021/09/01 20:44:31] root INFO: embedding_size : 512 [2021/09/01 20:44:31] root INFO: margin : 0.15 [2021/09/01 20:44:31] root INFO: name : ArcMargin [2021/09/01 20:44:31] root INFO: scale : 30 [2021/09/01 20:44:31] root INFO: Neck : [2021/09/01 20:44:31] root INFO: class_num : 512 [2021/09/01 20:44:31] root INFO: embedding_size : 1024 [2021/09/01 20:44:31] root INFO: name : FC [2021/09/01 20:44:31] root INFO: infer_add_softmax : False [2021/09/01 20:44:31] root INFO: infer_output_key : features [2021/09/01 20:44:31] root INFO: name : RecModel [2021/09/01 20:44:31] root INFO: DataLoader : [2021/09/01 20:44:31] root INFO: Eval : [2021/09/01 20:44:31] root INFO: Gallery : [2021/09/01 20:44:31] root INFO: dataset : [2021/09/01 20:44:31] root INFO: cls_label_path : ./dataset/imagenet_dataset/val_list.txt [2021/09/01 20:44:31] root INFO: image_root : ./dataset/imagenet_dataset/ [2021/09/01 20:44:31] root INFO: name : ImageNetDataset [2021/09/01 20:44:31] root INFO: transform_ops : [2021/09/01 20:44:31] root INFO: DecodeImage : [2021/09/01 20:44:31] root INFO: channel_first : False [2021/09/01 20:44:31] root INFO: to_rgb : True [2021/09/01 20:44:31] root INFO: ResizeImage : [2021/09/01 20:44:31] root INFO: size : 224 [2021/09/01 20:44:31] root INFO: NormalizeImage : [2021/09/01 20:44:31] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:44:31] root INFO: order : [2021/09/01 20:44:31] root INFO: scale : 1.0/255.0 [2021/09/01 20:44:31] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:44:31] root INFO: loader : [2021/09/01 20:44:31] root INFO: num_workers : 0 [2021/09/01 20:44:31] root INFO: use_shared_memory : True [2021/09/01 20:44:31] root INFO: sampler : [2021/09/01 20:44:31] root INFO: batch_size : 2 [2021/09/01 20:44:31] root INFO: drop_last : False [2021/09/01 20:44:31] root INFO: name : DistributedBatchSampler [2021/09/01 20:44:31] root INFO: shuffle : False [2021/09/01 20:44:31] root INFO: Query : [2021/09/01 20:44:31] root INFO: dataset : [2021/09/01 20:44:31] root INFO: cls_label_path : ./dataset/imagenet_dataset/val_list.txt [2021/09/01 20:44:31] root INFO: image_root : ./dataset/imagenet_dataset/ [2021/09/01 20:44:31] root INFO: name : ImageNetDataset [2021/09/01 20:44:31] root INFO: transform_ops : [2021/09/01 20:44:31] root INFO: DecodeImage : [2021/09/01 20:44:31] root INFO: channel_first : False [2021/09/01 20:44:31] root INFO: to_rgb : True [2021/09/01 20:44:31] root INFO: ResizeImage : [2021/09/01 20:44:31] root INFO: size : 224 [2021/09/01 20:44:31] root INFO: NormalizeImage : [2021/09/01 20:44:31] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:44:31] root INFO: order : [2021/09/01 20:44:31] root INFO: scale : 0.00392157 [2021/09/01 20:44:31] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:44:31] root INFO: loader : [2021/09/01 20:44:31] root INFO: num_workers : 0 [2021/09/01 20:44:31] root INFO: use_shared_memory : True [2021/09/01 20:44:31] root INFO: sampler : [2021/09/01 20:44:31] root INFO: batch_size : 2 [2021/09/01 20:44:31] root INFO: drop_last : False [2021/09/01 20:44:31] root INFO: name : DistributedBatchSampler [2021/09/01 20:44:31] root INFO: shuffle : False [2021/09/01 20:44:31] root INFO: Train : [2021/09/01 20:44:31] root INFO: dataset : [2021/09/01 20:44:31] root INFO: cls_label_path : ./dataset/imagenet_dataset/train_list.txt [2021/09/01 20:44:31] root INFO: image_root : ./dataset/imagenet_dataset/ [2021/09/01 20:44:31] root INFO: name : ImageNetDataset [2021/09/01 20:44:31] root INFO: transform_ops : [2021/09/01 20:44:31] root INFO: DecodeImage : [2021/09/01 20:44:31] root INFO: channel_first : False [2021/09/01 20:44:31] root INFO: to_rgb : True [2021/09/01 20:44:31] root INFO: ResizeImage : [2021/09/01 20:44:31] root INFO: size : 224 [2021/09/01 20:44:31] root INFO: RandFlipImage : [2021/09/01 20:44:31] root INFO: flip_code : 1 [2021/09/01 20:44:31] root INFO: NormalizeImage : [2021/09/01 20:44:31] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:44:31] root INFO: order : [2021/09/01 20:44:31] root INFO: scale : 0.00392157 [2021/09/01 20:44:31] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:44:31] root INFO: loader : [2021/09/01 20:44:31] root INFO: num_workers : 0 [2021/09/01 20:44:31] root INFO: use_shared_memory : True [2021/09/01 20:44:31] root INFO: sampler : [2021/09/01 20:44:31] root INFO: batch_size : 2 [2021/09/01 20:44:31] root INFO: drop_last : False [2021/09/01 20:44:31] root INFO: name : DistributedRandomIdentitySampler [2021/09/01 20:44:31] root INFO: num_instances : 2 [2021/09/01 20:44:31] root INFO: shuffle : True [2021/09/01 20:44:31] root INFO: Global : [2021/09/01 20:44:31] root INFO: checkpoints : None [2021/09/01 20:44:31] root INFO: device : gpu [2021/09/01 20:44:31] root INFO: epochs : 100 [2021/09/01 20:44:31] root INFO: eval_during_train : True [2021/09/01 20:44:31] root INFO: eval_interval : 10 [2021/09/01 20:44:31] root INFO: eval_mode : retrieval [2021/09/01 20:44:31] root INFO: image_shape : [3, 224, 224] [2021/09/01 20:44:31] root INFO: output_dir : ./output/MobileNetV1/ [2021/09/01 20:44:31] root INFO: pretrained_model : None [2021/09/01 20:44:31] root INFO: print_batch_step : 100 [2021/09/01 20:44:31] root INFO: save_inference_dir : ./inference [2021/09/01 20:44:31] root INFO: save_interval : 10 [2021/09/01 20:44:31] root INFO: use_visualdl : False [2021/09/01 20:44:31] root INFO: Loss : [2021/09/01 20:44:31] root INFO: Eval : [2021/09/01 20:44:31] root INFO: CELoss : [2021/09/01 20:44:31] root INFO: weight : 1.0 [2021/09/01 20:44:31] root INFO: Train : [2021/09/01 20:44:31] root INFO: CELoss : [2021/09/01 20:44:31] root INFO: weight : 1.0 [2021/09/01 20:44:31] root INFO: TripletLossV2 : [2021/09/01 20:44:31] root INFO: margin : 0.5 [2021/09/01 20:44:31] root INFO: weight : 1.0 [2021/09/01 20:44:31] root INFO: Metric : [2021/09/01 20:44:31] root INFO: Eval : [2021/09/01 20:44:31] root INFO: Recallk : [2021/09/01 20:44:31] root INFO: topk : [1, 5] [2021/09/01 20:44:31] root INFO: mAP : [2021/09/01 20:44:31] root INFO: Train : [2021/09/01 20:44:31] root INFO: TopkAcc : [2021/09/01 20:44:31] root INFO: topk : [1, 5] [2021/09/01 20:44:31] root INFO: Optimizer : [2021/09/01 20:44:31] root INFO: lr : [2021/09/01 20:44:31] root INFO: gamma : 0.5 [2021/09/01 20:44:31] root INFO: last_epoch : -1 [2021/09/01 20:44:31] root INFO: learning_rate : 0.000625 [2021/09/01 20:44:31] root INFO: milestones : [40, 60, 80] [2021/09/01 20:44:31] root INFO: name : MultiStepDecay [2021/09/01 20:44:31] root INFO: verbose : False [2021/09/01 20:44:31] root INFO: momentum : 0.9 [2021/09/01 20:44:31] root INFO: name : Momentum [2021/09/01 20:44:31] root INFO: regularizer : [2021/09/01 20:44:31] root INFO: coeff : 0.0005 [2021/09/01 20:44:31] root INFO: name : L2 W0901 20:44:31.899075 672 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.1, Runtime API Version: 10.2 W0901 20:44:31.908082 672 device_context.cc:422] device: 0, cuDNN Version: 7.6. [2021/09/01 20:44:34] root INFO: unique_endpoints {''} [2021/09/01 20:44:34] root INFO: Found C:\Users\22928/.paddleclas/weights\MobileNetV1_pretrained.pdparams [2021/09/01 20:44:34] root INFO: train with paddle 2.1.1 and device CUDAPlace(0) {'CELoss': {'weight': 1.0}} {'TripletLossV2': {'weight': 1.0, 'margin': 0.5}} Traceback (most recent call last): File "tools/train.py", line 31, in trainer.train() File "E:\PaddleClas2.2\ppcls\engine\trainer.py", line 174, in train loss_dict = self.train_loss_func(out, batch[1]) File "E:\PaddleClas2.2\ppcls\loss__init.py", line 46, in call__ loss = loss_func(input, batch) File "D:\lib\site-packages\paddle\fluid\dygraph\layers.py", line 902, in call outputs = self.forward(*inputs, kwargs) File "E:\PaddleClas2.2\ppcls\loss\triplet.py", line 66, in forward paddle.masked_select(dist, isneg), (bs, -1)), File "D:\lib\site-packages\paddle\tensor\manipulation.py", line 1575, in reshape return paddle.fluid.layers.reshape(x=x, shape=shape, name=name) File "D:\lib\site-packages\paddle\fluid\layers\nn.py", line 6142, in reshape out, = core.ops.reshape2(x, None, 'shape', shape) RuntimeError: (PreconditionNotMet) The Tensor's element number must be equal or greater than zero. The Tensor's shape is [2, -1] now [Hint: Expected numel() >= 0, but received numel():-2 < 0:0.] (at C:\home\workspace\Paddle_release2\paddle\fluid\framework\tensor.cc:59) [operator < reshape2 > error]**

问题2:aistudio下训练没问题,但训练过程中无topk : [1, 5]输出: 2021/09/01 20:59:07] root INFO: Arch : [2021/09/01 20:59:07] root INFO: Backbone : [2021/09/01 20:59:07] root INFO: name : MobileNetV1 [2021/09/01 20:59:07] root INFO: pretrained : True [2021/09/01 20:59:07] root INFO: BackboneStopLayer : [2021/09/01 20:59:07] root INFO: name : flatten_0 [2021/09/01 20:59:07] root INFO: Head : [2021/09/01 20:59:07] root INFO: class_num : 101 [2021/09/01 20:59:07] root INFO: embedding_size : 512 [2021/09/01 20:59:07] root INFO: margin : 0.15 [2021/09/01 20:59:07] root INFO: name : ArcMargin [2021/09/01 20:59:07] root INFO: scale : 30 [2021/09/01 20:59:07] root INFO: Neck : [2021/09/01 20:59:07] root INFO: class_num : 512 [2021/09/01 20:59:07] root INFO: embedding_size : 1024 [2021/09/01 20:59:07] root INFO: name : FC [2021/09/01 20:59:07] root INFO: infer_add_softmax : False [2021/09/01 20:59:07] root INFO: infer_output_key : features [2021/09/01 20:59:07] root INFO: name : RecModel [2021/09/01 20:59:07] root INFO: DataLoader : [2021/09/01 20:59:07] root INFO: Eval : [2021/09/01 20:59:07] root INFO: Gallery : [2021/09/01 20:59:07] root INFO: dataset : [2021/09/01 20:59:07] root INFO: cls_label_path : ./dataset/bbox_img/test_list.txt [2021/09/01 20:59:07] root INFO: image_root : ./dataset/bbox_img/ [2021/09/01 20:59:07] root INFO: name : ImageNetDataset [2021/09/01 20:59:07] root INFO: transform_ops : [2021/09/01 20:59:07] root INFO: DecodeImage : [2021/09/01 20:59:07] root INFO: channel_first : False [2021/09/01 20:59:07] root INFO: to_rgb : True [2021/09/01 20:59:07] root INFO: ResizeImage : [2021/09/01 20:59:07] root INFO: size : 224 [2021/09/01 20:59:07] root INFO: NormalizeImage : [2021/09/01 20:59:07] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:59:07] root INFO: order : [2021/09/01 20:59:07] root INFO: scale : 1.0/255.0 [2021/09/01 20:59:07] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:59:07] root INFO: loader : [2021/09/01 20:59:07] root INFO: num_workers : 2 [2021/09/01 20:59:07] root INFO: use_shared_memory : False [2021/09/01 20:59:07] root INFO: sampler : [2021/09/01 20:59:07] root INFO: batch_size : 64 [2021/09/01 20:59:07] root INFO: drop_last : False [2021/09/01 20:59:07] root INFO: name : DistributedBatchSampler [2021/09/01 20:59:07] root INFO: shuffle : False [2021/09/01 20:59:07] root INFO: Query : [2021/09/01 20:59:07] root INFO: dataset : [2021/09/01 20:59:07] root INFO: cls_label_path : ./dataset/bbox_img/val_list.txt [2021/09/01 20:59:07] root INFO: image_root : ./dataset/bbox_img/ [2021/09/01 20:59:07] root INFO: name : ImageNetDataset [2021/09/01 20:59:07] root INFO: transform_ops : [2021/09/01 20:59:07] root INFO: DecodeImage : [2021/09/01 20:59:07] root INFO: channel_first : False [2021/09/01 20:59:07] root INFO: to_rgb : True [2021/09/01 20:59:07] root INFO: ResizeImage : [2021/09/01 20:59:07] root INFO: size : 224 [2021/09/01 20:59:07] root INFO: NormalizeImage : [2021/09/01 20:59:07] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:59:07] root INFO: order : [2021/09/01 20:59:07] root INFO: scale : 0.00392157 [2021/09/01 20:59:07] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:59:07] root INFO: loader : [2021/09/01 20:59:07] root INFO: num_workers : 2 [2021/09/01 20:59:07] root INFO: use_shared_memory : False [2021/09/01 20:59:07] root INFO: sampler : [2021/09/01 20:59:07] root INFO: batch_size : 64 [2021/09/01 20:59:07] root INFO: drop_last : False [2021/09/01 20:59:07] root INFO: name : DistributedBatchSampler [2021/09/01 20:59:07] root INFO: shuffle : False [2021/09/01 20:59:07] root INFO: Train : [2021/09/01 20:59:07] root INFO: dataset : [2021/09/01 20:59:07] root INFO: cls_label_path : ./dataset/bbox_img/train_list.txt [2021/09/01 20:59:07] root INFO: image_root : ./dataset/bbox_img/ [2021/09/01 20:59:07] root INFO: name : ImageNetDataset [2021/09/01 20:59:07] root INFO: transform_ops : [2021/09/01 20:59:07] root INFO: DecodeImage : [2021/09/01 20:59:07] root INFO: channel_first : False [2021/09/01 20:59:07] root INFO: to_rgb : True [2021/09/01 20:59:07] root INFO: ResizeImage : [2021/09/01 20:59:07] root INFO: size : 224 [2021/09/01 20:59:07] root INFO: RandFlipImage : [2021/09/01 20:59:07] root INFO: flip_code : 1 [2021/09/01 20:59:07] root INFO: NormalizeImage : [2021/09/01 20:59:07] root INFO: mean : [0.419, 0.443, 0.461] [2021/09/01 20:59:07] root INFO: order : [2021/09/01 20:59:07] root INFO: scale : 0.00392157 [2021/09/01 20:59:07] root INFO: std : [0.102, 0.109, 0.117] [2021/09/01 20:59:07] root INFO: loader : [2021/09/01 20:59:07] root INFO: num_workers : 2 [2021/09/01 20:59:07] root INFO: use_shared_memory : False [2021/09/01 20:59:07] root INFO: sampler : [2021/09/01 20:59:07] root INFO: batch_size : 64 [2021/09/01 20:59:07] root INFO: drop_last : False [2021/09/01 20:59:07] root INFO: name : DistributedRandomIdentitySampler [2021/09/01 20:59:07] root INFO: num_instances : 2 [2021/09/01 20:59:07] root INFO: shuffle : True [2021/09/01 20:59:07] root INFO: Global : [2021/09/01 20:59:07] root INFO: checkpoints : None [2021/09/01 20:59:07] root INFO: device : gpu [2021/09/01 20:59:07] root INFO: epochs : 100 [2021/09/01 20:59:07] root INFO: eval_during_train : True [2021/09/01 20:59:07] root INFO: eval_interval : 10 [2021/09/01 20:59:07] root INFO: eval_mode : retrieval [2021/09/01 20:59:07] root INFO: image_shape : [3, 224, 224] [2021/09/01 20:59:07] root INFO: output_dir : ./output/MobileNetV1/ [2021/09/01 20:59:07] root INFO: pretrained_model : None [2021/09/01 20:59:07] root INFO: print_batch_step : 100 [2021/09/01 20:59:07] root INFO: save_inference_dir : ./inference [2021/09/01 20:59:07] root INFO: save_interval : 10 [2021/09/01 20:59:07] root INFO: use_visualdl : False [2021/09/01 20:59:07] root INFO: Loss : [2021/09/01 20:59:07] root INFO: Eval : [2021/09/01 20:59:07] root INFO: CELoss : [2021/09/01 20:59:07] root INFO: weight : 1.0 [2021/09/01 20:59:07] root INFO: Train : [2021/09/01 20:59:07] root INFO: CELoss : [2021/09/01 20:59:07] root INFO: weight : 1.0 [2021/09/01 20:59:07] root INFO: TripletLossV2 : [2021/09/01 20:59:07] root INFO: margin : 0.5 [2021/09/01 20:59:07] root INFO: weight : 1.0 [2021/09/01 20:59:07] root INFO: Metric : [2021/09/01 20:59:07] root INFO: Eval : [2021/09/01 20:59:07] root INFO: Recallk : [2021/09/01 20:59:07] root INFO: topk : [1, 5] [2021/09/01 20:59:07] root INFO: mAP : [2021/09/01 20:59:07] root INFO: Train : [2021/09/01 20:59:07] root INFO: TopkAcc : [2021/09/01 20:59:07] root INFO: topk : [1, 5] [2021/09/01 20:59:07] root INFO: Optimizer : [2021/09/01 20:59:07] root INFO: lr : [2021/09/01 20:59:07] root INFO: gamma : 0.5 [2021/09/01 20:59:07] root INFO: last_epoch : -1 [2021/09/01 20:59:07] root INFO: learning_rate : 0.01 [2021/09/01 20:59:07] root INFO: milestones : [40, 60, 80] [2021/09/01 20:59:07] root INFO: name : MultiStepDecay [2021/09/01 20:59:07] root INFO: verbose : False [2021/09/01 20:59:07] root INFO: momentum : 0.9 [2021/09/01 20:59:07] root INFO: name : Momentum [2021/09/01 20:59:07] root INFO: regularizer : [2021/09/01 20:59:07] root INFO: coeff : 0.0005 [2021/09/01 20:59:07] root INFO: name : L2 W0901 20:59:07.876523 16854 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1 W0901 20:59:07.881187 16854 device_context.cc:422] device: 0, cuDNN Version: 7.6. [2021/09/01 20:59:12] root INFO: unique_endpoints {''} [2021/09/01 20:59:12] root INFO: Found /home/aistudio/.paddleclas/weights/MobileNetV1_pretrained.pdparams [2021/09/01 20:59:13] root INFO: train with paddle 2.1.2 and device CUDAPlace(0) {'CELoss': {'weight': 1.0}} {'TripletLossV2': {'weight': 1.0, 'margin': 0.5}} [2021/09/01 20:59:13] root INFO: [Train][Epoch 1/100][Avg] [2021/09/01 20:59:13] root INFO: [Train][Epoch 2/100][Avg] [2021/09/01 20:59:13] root INFO: [Train][Epoch 3/100][Avg] [2021/09/01 20:59:14] root INFO: [Train][Epoch 4/100][Avg] [2021/09/01 20:59:14] root INFO: [Train][Epoch 5/100][Avg] [2021/09/01 20:59:14] root INFO: [Train][Epoch 6/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 7/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 8/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 9/100][Avg] [2021/09/01 20:59:15] root INFO: [Train][Epoch 10/100][Avg]

cuicheng01 commented 3 years ago

嗨,您好。注意到您使用的batch-size是2,但是使用的是TripletLoss,这样是不合理的,batch-size最少应该是4。另外,如果使用较小的batch-size,不建议使用TripletLoss,可以直接使用Arcmargin哈~

paddle-bot-old[bot] commented 2 years ago

Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于该问题超过三个月未更新,将会被关闭,若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。