Open MR-hyj opened 1 year ago
I prepared a custom coco dataset, and successfully trained a teacher model and a student model. here are their performance:
# teacher # saved to work_dirs/atss_r50_1x_TaiZhou/epoch_last.pth bbox_mAP: 0.4180, bbox_mAP_50: 0.6940, bbox_mAP_75: 0.4230, bbox_mAP_s: 0.1610, bbox_mAP_m: 0.4310, bbox_mAP_l: 0.6460 # student # saved to work_dirs/atss_r101_3x_ms_TaiZhou/epoch_last.pth bbox_mAP: 0.3590, bbox_mAP_50: 0.6540, bbox_mAP_75: 0.3360, bbox_mAP_s: 0.1450, bbox_mAP_m: 0.4020, bbox_mAP_l: 0.5570
however I tried to distill using the above teacher and student;
CUDA_VISIBLE_DEVICES=4,5,6,7 bash tools/dist_train.sh work_configs/pgd_atss_r101_r50_1x_TaiZhou.py 4 --work-dir work_dirs/dist_pgd_atss_r101_r50_1x_TaiZhou
the student wasn't getting better at all, whose performance remained zero, and was not increasing with epoch going on.
My best guess is the cfg file was wrong: ```yaml # work_configs/pgd_atss_r101_r50_1x_TaiZhou.py _base_ = "base/1x_setting.py" temperature = 0.8 alpha = 0.08 delta = 0.0008 beta = alpha * 0.5 gamma = alpha * 1.6 fp16 = dict(loss_scale=512.) dataset_type = 'MyCocoDataset' data_root = 'data/MyCoco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True ) img_resize = (640, 640) classes = (...) # 8 classes train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( type='Resize', # img_scale=(1333, 800), img_scale=img_resize, keep_ratio=True ), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', # img_scale=(1333, 800), img_scale=img_resize, flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=2, train=dict( type=dataset_type, classes=classes, ann_file=data_root + 'annotations/instances_train.json', img_prefix=data_root + 'train/', pipeline=train_pipeline), val=dict( type=dataset_type, classes=classes, ann_file=data_root + 'annotations/instances_val.json', img_prefix=data_root + 'val/', pipeline=test_pipeline), test=dict( type=dataset_type, classes=classes, ann_file=data_root + 'annotations/instances_val.json', img_prefix=data_root + 'val/', pipeline=test_pipeline) ) distiller = dict( type='PredictionGuidedDistiller', teacher_pretrained = 'work_dirs/atss_r101_3x_ms_TaiZhou/epoch_12.pth', init_student = True, distill_cfg=[ # this part was not edited ] ) # I'm sure student_cfg and teacher_cfg use the aforementioned .pth weights student_cfg = 'work_configs/detectors/atss_r50_distill_head_TaiZhou.py' teacher_cfg = 'work_configs/detectors/atss_r101_3x_ms_TaiZhou.py'
I prepared a custom coco dataset, and successfully trained a teacher model and a student model. here are their performance:
however I tried to distill using the above teacher and student;
the student wasn't getting better at all, whose performance remained zero, and was not increasing with epoch going on.