xywlpo commented 1 year ago

问题确认 Search before asking

[X] 我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

感谢你们的工作，我在最新的v2.6版本中看到了ppyoloe-plus的蒸馏方法，但是其中没有看到对tiny模型的蒸馏，请问现在如果我想用ppyoloe-plus-large或ppyoloe-plus-m的模型去蒸馏ppyoloe-plus-tiny的模型，基于paddledetection2.6，配置文件应该怎么去修改？

nemonameless commented 1 year ago

可以参照X蒸L，只改下depth width 预训练权重链接就行，以及epoch数。 ppyoloe-plus-t 公布的模型是300epoch训的，带辅助头的。如果蒸馏，至少也得训300epoch才能最终高于原版。

xywlpo commented 1 year ago

可以参照X蒸L，只改下depth width 预训练权重链接就行，以及epoch数。 ppyoloe-plus-t 公布的模型是300epoch训的，带辅助头的。如果蒸馏，至少也得训300epoch才能最终高于原版。

感谢您的回复，我修改了配置文件，用m去蒸馏t，student的蒸馏配置文件，我只保留的下面的内容

BASE: [ '../ppyoloe_plus_crn_t_auxhead_300e_coco.yml', ] for_distill: True architecture: PPYOLOE 然后trainreader如下 TrainReader: sample_transforms:

Decode: {}
Resize: {target_size: *eval_size, keep_ratio: True, interp: 2}
RandomDistort: {}
RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
RandomCrop: {}
RandomFlip: {} batch_transforms:
BatchRandomResize: {target_size: [[384, 640]], random_size: True, random_interp: True, keep_ratio: False}
NormalizeImage: {mean: [0., 0., 0.], std: [1., 1., 1.], norm_type: none}
Permute: {}
PadGT: {} batch_size: 8 shuffle: true drop_last: true use_shared_memory: true collate_batch: true

蒸馏的文件如下：

BASE: [ '../../ppyoloe/ppyoloe_plus_crn_m_80e_coco.yml', ] depth_mult: 0.67 width_mult: 0.75 for_distill: True architecture: PPYOLOE PPYOLOE: backbone: CSPResNet neck: CustomCSPPAN yolo_head: PPYOLOEHead post_process: ~

find_unused_parameters: True

slim: Distill slim_method: PPYOLOEDistill distill_loss: DistillPPYOLOELoss

DistillPPYOLOELoss: # M -> S loss_weight: {'logits': 4.0, 'feat': 1.0} logits_distill: True logits_loss_weight: {'class': 1.0, 'iou': 2.5, 'dfl': 0.5} logits_ld_distill: True logits_ld_params: {'weight': 20000, 'T': 10} feat_distill: True feat_distiller: 'fgd' # ['cwd', 'fgd', 'pkd', 'mgd', 'mimic'] feat_distill_place: 'neck_feats' teacher_width_mult: 0.75 # M student_width_mult: 0.375 # T feat_out_channels: [768, 384, 192] # The actual channel will multiply width_mult

m和t的trainreader用的同一个配置文件，启动训练后报错如下：

[02/25 13:21:40] ppdet.utils.checkpoint INFO: Finish loading model weights: ./ppyoloe_plus_crn_t_auxhead_300e_coco.pdparams [02/25 13:21:40] ppdet.slim.distill_model INFO: Student model has loaded pretrain weights! [02/25 13:21:40] ppdet.utils.download INFO: Downloading ppyoloe_crn_m_obj365_pretrained.pdparams from https://bj.bcebos.com/v1/paddledet/models/pretrained/ppyoloe_crn_m_obj365_pretrained.pdparams

0%| | 0/104604 [00:00<?, ?KB/s] 0%| | 387/104604 [00:00<00:27, 3722.09KB/s] 5%|▍ | 4958/104604 [00:00<00:03, 28021.77KB/s] 12%|█▏ | 12797/104604 [00:00<00:01, 50813.59KB/s] 19%|█▉ | 19886/104604 [00:00<00:01, 58692.66KB/s] 26%|██▋ | 27655/104604 [00:00<00:01, 65517.11KB/s] 34%|███▍ | 35867/104604 [00:00<00:00, 71142.46KB/s] 41%|████ | 42997/104604 [00:00<00:00, 68722.33KB/s] 49%|████▉ | 51566/104604 [00:00<00:00, 73982.87KB/s] 58%|█████▊ | 60899/104604 [00:00<00:00, 79711.63KB/s] 68%|██████▊ | 70749/104604 [00:01<00:00, 85429.46KB/s] 78%|███████▊ | 81387/104604 [00:01<00:00, 91776.50KB/s] 88%|████████▊ | 92225/104604 [00:01<00:00, 96791.31KB/s] 98%|█████████▊| 102509/104604 [00:01<00:00, 98602.84KB/s] 100%|██████████| 104604/104604 [00:01<00:00, 78001.36KB/s] [02/25 13:21:42] ppdet.utils.checkpoint INFO: ['yolo_head.stem_cls.0.conv.bn._mean', 'yolo_head.stem_cls.0.conv.bn._variance', 'yolo_head.stem_cls.0.conv.bn.bias', 'yolo_head.stem_cls.0.conv.bn.weight', 'yolo_head.stem_cls.0.conv.conv.weight', 'yolo_head.stem_cls.1.conv.bn._mean', 'yolo_head.stem_cls.1.conv.bn._variance', 'yolo_head.stem_cls.1.conv.bn.bias', 'yolo_head.stem_cls.1.conv.bn.weight', 'yolo_head.stem_cls.1.conv.conv.weight', 'yolo_head.stem_cls.2.conv.bn._mean', 'yolo_head.stem_cls.2.conv.bn._variance', 'yolo_head.stem_cls.2.conv.bn.bias', 'yolo_head.stem_cls.2.conv.bn.weight', 'yolo_head.stem_cls.2.conv.conv.weight', 'yolo_head.stem_reg.0.conv.bn._mean', 'yolo_head.stem_reg.0.conv.bn._variance', 'yolo_head.stem_reg.0.conv.bn.bias', 'yolo_head.stem_reg.0.conv.bn.weight', 'yolo_head.stem_reg.0.conv.conv.weight', 'yolo_head.stem_reg.1.conv.bn._mean', 'yolo_head.stem_reg.1.conv.bn._variance', 'yolo_head.stem_reg.1.conv.bn.bias', 'yolo_head.stem_reg.1.conv.bn.weight', 'yolo_head.stem_reg.1.conv.conv.weight', 'yolo_head.stem_reg.2.conv.bn._mean', 'yolo_head.stem_reg.2.conv.bn._variance', 'yolo_head.stem_reg.2.conv.bn.bias', 'yolo_head.stem_reg.2.conv.bn.weight', 'yolo_head.stem_reg.2.conv.conv.weight'] in pretrained weight is not used in the model, and its will not be loaded [02/25 13:21:42] ppdet.utils.checkpoint INFO: The shape [365] in pretrained weight yolo_head.pred_cls.0.bias is unmatched with the shape [1] in model yolo_head.pred_cls.0.bias. And the weight yolo_head.pred_cls.0.bias will not be loaded [02/25 13:21:42] ppdet.utils.checkpoint INFO: The shape [365, 576, 3, 3] in pretrained weight yolo_head.pred_cls.0.weight is unmatched with the shape [1, 576, 3, 3] in model yolo_head.pred_cls.0.weight. And the weight yolo_head.pred_cls.0.weight will not be loaded [02/25 13:21:42] ppdet.utils.checkpoint INFO: The shape [365] in pretrained weight yolo_head.pred_cls.1.bias is unmatched with the shape [1] in model yolo_head.pred_cls.1.bias. And the weight yolo_head.pred_cls.1.bias will not be loaded [02/25 13:21:42] ppdet.utils.checkpoint INFO: The shape [365, 288, 3, 3] in pretrained weight yolo_head.pred_cls.1.weight is unmatched with the shape [1, 288, 3, 3] in model yolo_head.pred_cls.1.weight. And the weight yolo_head.pred_cls.1.weight will not be loaded [02/25 13:21:42] ppdet.utils.checkpoint INFO: The shape [365] in pretrained weight yolo_head.pred_cls.2.bias is unmatched with the shape [1] in model yolo_head.pred_cls.2.bias. And the weight yolo_head.pred_cls.2.bias will not be loaded [02/25 13:21:42] ppdet.utils.checkpoint INFO: The shape [365, 144, 3, 3] in pretrained weight yolo_head.pred_cls.2.weight is unmatched with the shape [1, 144, 3, 3] in model yolo_head.pred_cls.2.weight. And the weight yolo_head.pred_cls.2.weight will not be loaded [02/25 13:21:42] ppdet.utils.checkpoint INFO: Finish loading model weights: /root/.cache/paddle/weights/ppyoloe_crn_m_obj365_pretrained.pdparams [02/25 13:21:42] ppdet.slim.distill_model INFO: Teacher model has loaded pretrain weights! I0225 13:21:42.930284 553 tcp_utils.cc:181] The server starts to listen on IP_ANY:41628 I0225 13:21:42.931370 553 tcp_utils.cc:130] Successfully connected to 10.233.69.204:41628 loading annotations into memory... Done (t=8.52s) creating index... index created! [02/25 13:21:54] ppdet.data.source.coco WARNING: Found an invalid bbox in annotations: im_id: 163, area: 0.0 x1: -1.0, y1: -1.0, x2: -1.0, y2: -1.0. [02/25 13:24:11] ppdet.data.source.coco WARNING: Found an invalid bbox in annotations: im_id: 30874, area: 0.0 x1: 235.00021500000003, y1: 1.9999399999999998, x2: 235.00021500000003, y2: 2.99991. [02/25 13:24:22] ppdet.data.source.coco WARNING: Found an invalid bbox in annotations: im_id: 33323, area: 0.0 x1: 93.999984, y1: 20.999888, x2: 93.999984, y2: 21.999952. [02/25 13:28:29] ppdet.data.source.coco INFO: Load [89283 samples valid, 0 samples invalid] in file /root/mnt1/jn/general_human_detect/datasets/trainval/ppyoloe_plus_20230213_renamed/train.json. Traceback (most recent call last): File "tools/train.py", line 202, in main() File "tools/train.py", line 198, in main run(FLAGS, cfg) File "tools/train.py", line 151, in run trainer.train(FLAGS.eval) File "/home/paddleflow/storage/mnt/fs-jiangnan19-mnt1/jn/general_human_detect/paddledet/PaddleDetection-release-2.6-distill/ppdet/engine/trainer.py", line 539, in train outputs = model(data) File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, kwargs) File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/fluid/dygraph/parallel.py", line 774, in forward outputs = self._layers(*inputs, *kwargs) File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(inputs, kwargs) File "/home/paddleflow/storage/mnt/fs-jiangnan19-mnt1/jn/general_human_detect/paddledet/PaddleDetection-release-2.6-distill/ppdet/slim/distill_model.py", line 343, in forward logits_loss, feat_loss = self.distill_loss(self.teacher_model, File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, kwargs) File "/home/paddleflow/storage/mnt/fs-jiangnan19-mnt1/jn/general_human_detect/paddledet/PaddleDetection-release-2.6-distill/ppdet/slim/distill_loss.py", line 418, in forward loss_module(stu_feats[i], tea_feats[i], inputs)) File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(*inputs, *kwargs) File "/home/paddleflow/storage/mnt/fs-jiangnan19-mnt1/jn/general_human_detect/paddledet/PaddleDetection-release-2.6-distill/ppdet/slim/distill_loss.py", line 655, in forward stu_feature = self.align(stu_feature) File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 948, in call return self.forward(inputs, kwargs) File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/nn/layer/conv.py", line 712, in forward out = F.conv._conv_nd( File "/root/mnt1/jn/conda/envs/paddleslim/lib/python3.8/site-packages/paddle/nn/functional/conv.py", line 140, in _conv_nd pre_bias = _C_ops.conv2d( ValueError: (InvalidArgument) The number of input's channels should be equal to filter's channels groups for Op(Conv). But received: the input's channels is 144, the input's shape is [8, 144, 12, 20]; the filter's channels is 288, the filter's shape is [576, 288, 1, 1]; the groups is 1, the data_format is NCHW. The error may come from wrong data_format setting. [Hint: Expected input_channels == filter_dims[1] groups, but received input_channels:144 != filter_dims[1] * groups:288.] (at /paddle/paddle/phi/infermeta/binary.cc:529)

I0225 13:28:39.694931 744 tcp_store.cc:257] receive shutdown event and so quit from MasterDaemon run loop

请问我是哪里配置的有问题，非常感谢！！

PaddlePaddle / PaddleDetection

ppyoloe-plus-t 的模型蒸馏 #7832

问题确认 Search before asking

请提出你的问题 Please ask your question

感谢您的回复，我修改了配置文件，用m去蒸馏t，student的蒸馏配置文件，我只保留的下面的内容

蒸馏的文件如下：

m和t的trainreader用的同一个配置文件，启动训练后报错如下：