enyac-group / AdaScale

code for AdaScale MLSys'19 https://proceedings.mlsys.org/book/275.pdf
Apache License 2.0
4 stars 1 forks source link

I have a question for training scalereg.. #1

Open tissuemother opened 4 years ago

tissuemother commented 4 years ago

Thanks for your great work, I already run "testloss.py" nicely. Now I wanna train your Adascale scalereg . But I have some errors while training. It would be my mistake but I can not know how to fix it. :( Those are errors.. $ python rfcn_scalereg_train.py --cfg cfgs/resnet_v1_101_scalereg.yaml

exp_config = edict(yaml.load(f)) ('Called with argument:', Namespace(cfg='cfgs/resnet_v1_101_scalereg.yaml', frequent=100)) {'CLASS_AGNOSTIC': True, 'KAPPA': 0, 'MAX_REG_SCALE': 600, 'MIN_REG_SCALE': 128, 'MXNET_VERSION': 'mxnet', 'OPTIMAL': False, 'SCALES': [(600, 2000), (540, 2000), (480, 2000), (420, 2000), (360, 2000), (300, 2000), (240, 2000), (180, 2000), (128, 2000)], 'TEST': {'BATCH_IMAGES': 1, 'CXX_PROPOSAL': True, 'HAS_RPN': True, 'NMS': 0.3, 'RPN_MIN_SIZE': 0, 'RPN_NMS_THRESH': 0.7, 'RPN_POST_NMS_TOP_N': 300, 'RPN_PRE_NMS_TOP_N': 6000, 'SOFTNMS_THRESH': 0.6, 'USE_SOFTNMS': False, 'max_per_image': 300, 'reg_test_epoch': 0, 'test_epoch': 2}, 'TEST_SCALES': [(600, 1000)], 'TRAIN': {'ASPECT_GROUPING': True, 'BATCH_IMAGES': 1, 'BATCH_ROIS': -1, 'BATCH_ROIS_OHEM': 128, 'BBOX_MEANS': [0.0, 0.0, 0.0, 0.0], 'BBOX_NORMALIZATION_PRECOMPUTED': True, 'BBOX_REGRESSION_THRESH': 0.5, 'BBOX_STDS': [0.1, 0.1, 0.2, 0.2], 'BBOX_WEIGHTS': array([1., 1., 1., 1.]), 'BG_THRESH_HI': 0.5, 'BG_THRESH_LO': 0.0, 'CXX_PROPOSAL': True, 'ENABLE_OHEM': True, 'END2END': True, 'FG_FRACTION': 0.25, 'FG_THRESH': 0.5, 'FLIP': True, 'RESUME': False, 'RPN_BATCH_SIZE': 256, 'RPN_BBOX_WEIGHTS': [1.0, 1.0, 1.0, 1.0], 'RPN_CLOBBER_POSITIVES': False, 'RPN_FG_FRACTION': 0.5, 'RPN_MIN_SIZE': 0, 'RPN_NEGATIVE_OVERLAP': 0.3, 'RPN_NMS_THRESH': 0.7, 'RPN_POSITIVE_OVERLAP': 0.7, 'RPN_POSITIVE_WEIGHT': -1.0, 'RPN_POST_NMS_TOP_N': 300, 'RPN_PRE_NMS_TOP_N': 6000, 'SHUFFLE': True, 'begin_epoch': 0, 'end_epoch': 3, 'lr': 0.0001, 'lr_factor': 0.1, 'lr_step': '1,2', 'model_prefix': 'rfcn_vid_scalereg', 'momentum': 0.9, 'regressor_model_prefix': '', 'rfcn_model_prefix': '', 'warmup': False, 'warmup_lr': 0, 'warmup_step': 0, 'wd': 0.0005}, 'dataset': {'NUM_CLASSES': 31, 'dataset': 'ImageNetVID', 'dataset_path': 'data/imagenet/ILSVRC', 'image_set': 'DET_train_30classes+VID_train_15frames', 'proposal': 'rpn', 'root_path': 'data/imagenet', 'test_image_set': 'VID_val_frames'}, 'default': {'frequent': 100, 'kvstore': 'device'}, 'gpus': '0,1', 'network': {'ANCHOR_MEANS': [0.0, 0.0, 0.0, 0.0], 'ANCHOR_RATIOS': [0.5, 1, 2], 'ANCHOR_SCALES': [8, 16, 32], 'ANCHOR_STDS': [0.1, 0.1, 0.4, 0.4], 'FIXED_PARAMS': ['conv1', 'bn_conv1', 'res2', 'bn2', 'gamma', 'beta'], 'IMAGE_STRIDE': 0, 'NORMALIZE_RPN': True, 'NUM_ANCHORS': 9, 'PIXEL_MEANS': array([103.06, 115.9 , 123.15]), 'RCNN_FEAT_STRIDE': 16, 'RPN_FEAT_STRIDE': 16, 'UPDATE_PARAMS': ['grow', 'shrink', 'scale'], 'pretrained': './model/pretrained_model/rfcn_vid', 'pretrained_epoch': 4}, 'output_path': 'data/adascale_output/rfcn/imagenet_vid', 'symbol': 'resnet_v1_101_rfcn'} num_images 53639 ImageNetVID_DET_train_30classes gt roidb loaded from data/imagenet/cache/ImageNetVID_DET_train_30classes_gt_roidb.pkl num_images 57834 ImageNetVID_VID_train_15frames gt roidb loaded from data/imagenet/cache/ImageNetVID_VID_train_15frames_gt_roidb.pkl filtered 1658 roidb entries: 111473 -> 109815 Traceback (most recent call last): File "rfcn_scalereg_train.py", line 18, in train_scalereg.main() File "../../rfcn/train_scalereg.py", line 159, in main config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step) File "../../rfcn/train_scalereg.py", line 84, in train_scale_reg train_data = TrainScaleRegLoader(roidb, optimal_output_path, config, batch_size=input_batch_size, shuffle=config.TRAIN.SHUFFLE, ctx=ctx, has_rpn=config.TEST.HAS_RPN) File "../../rfcn/core/loader.py", line 621, in init assert os.path.exists(name), '{} does not exist'.format(name) AssertionError: data/adascale_output/rfcn/imagenet_vid/resnet_v1_101_scalereg/DET_train_30classes_VID_train_15frames/resnet_v1_101_rfcn_optimal.pkl does not exist

I can not find any of "resnet_v1_101_rfcn_optimal.pkl" file... Can i figure out how to train scale regressor on your work?

RudyChin commented 3 years ago

Hi,

First of all, thank you for your interest in our work and I'm really sorry for the late reply! I didn't see this until now... TO generate the optimal resolution file, you need to run forward passes for all the resolutions with the following command: python experiment/rfcn/rfcn_testloss.py --cfg experiment/rfcn/cfg/rfcn_vid_demo.yaml However, you need to set the TEST_SCALES in https://github.com/cmu-enyac/AdaScale/blob/master/experiments/rfcn/cfgs/rfcn_vid_demo.yaml to the scales you are interested in. For example, [[600,2000], [480,2000], [360,2000], [240,2000], [128,2000]]. If you follow the code, the command above will go to https://github.com/cmu-enyac/AdaScale/blob/master/rfcn/function/test_rcnn.py#L25 where the prediction results are stored by https://github.com/cmu-enyac/AdaScale/blob/master/rfcn/function/test_rcnn.py#L76

Once you've gathered the prediction results for all the scales, you can use https://github.com/cmu-enyac/AdaScale/blob/master/analyze_loss.py to generate the optimal.pkl file

Hope this helps!

Rudy