dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5.01k stars 1.32k forks source link

Transfer learning with larger images #356

Open AndreiBaraian opened 4 years ago

AndreiBaraian commented 4 years ago

I would like to do transfer learning on the pretrained yolact models, but I would like to use high resolution images, like 1024x1024 or 2048x2048. I cannot train it from scratch, since I have very few data, like 100 images. If I try to use one of the pretrained yolact models, I get this kind of error: image

First of all, is it possible to do transfer learning on a model that was trained with different image resolution? (At least that's how I did with Mask RCNN). And if it is possible, what would I need to adjust in order to not receive the above error?

abhigoku10 commented 4 years ago

@AndreiBaraian you can do transfering learning , but even though u give high resolution images it will be resized to 550x550 / 700x700 image size . i have faced error where i get memory occupied not sure abt the error your getting

AndreiBaraian commented 4 years ago

@abhigoku10 does it get resized even if I change the max_size parameter? Because I've set that one to the desired (high) resolution, so I suppose it does not rescale it. But I might be wrong.

dbolya commented 4 years ago

Oh definitely don't change max_size to 2048, that'll use way too much RAM (not to mention it'll be super slow). The error you got likely means you ran out of CPU ram since the images were too big for your batch or something (idk it's a C heap allocation error).

I'm currently working on tuning YOLACT to work better on bigger images, but for now I suggest you leave max_size as 550 (or use the im700 config), and then fine tune only the last layers: #334.

sdimantsd commented 4 years ago

@dbolya I'm curious, can you tell me what change you are working on? :-) In addition, are you working on a network with a non-square input?

abhigoku10 commented 4 years ago

@sdimantsd he had mentioned that is working on code which takes non square image as an input to the training besides that not sure what all changes he is making

dbolya commented 4 years ago

@sdimantsd, @abhigoku10 More efficient data augmentation to use less RAM. It's still using more RAM and CPU time than I'd like (on Cityscapes' 2k x 1k images), but I'm close to being able to release it.

PareshKamble commented 4 years ago

Hi @dbolya @abhigoku10 I prepared a dataset with 1024x1024 sized 2000 training images and 850 validation images containing players and ball classes. In config.py I added

player_ball_dataset = dataset_base.copy({
  'name': 'player_ball_dataset',

  'train_info': '/home/paresh/Documents/output/train/coco_instances.json',
  'train_images': '/home/paresh/Documents/output/train/images/',

  'valid_info': '/home/paresh/Documents/output/val/coco_instances.json',
  'valid_images': '/home/paresh/Documents/output/val/images/',

  'class_names': ('ball', 'player'),
  'label_map': { 1:  1,  2:  2}
})

and

yolact_resnet101_player_ball_config = yolact_im700_config.copy({
    'name': 'yolact_resnet101_player_ball_config',
    # Dataset stuff
    'dataset': player_ball_dataset,
    'num_classes': len(player_ball_dataset.class_names) + 1,

    # Image Size
    'max_size': 700,
})

In yolact.py, I replaced self.load_state_dict(state_dict) with

try:
    self.load_state_dict(state_dict)
except RuntimeError as e:
    print('Ignoring "' + str(e) + '"')

and also replaced p = pred_layer(pred_x) with p = pred_layer(pred_x.detach())

I fine tuned with this command: python train.py --config=yolact_resnet101_player_ball_config --resume=weights/yolact_im700_54_800000.pth --start_iter=-1 --batch_size=5

I left the process for 12 hours (1,10,000 iterations) and interrupted with Ctrl+C

Later, I tested the fine-tuned model with: python eval.py --trained_model=weights/yolact_resnet101_player_ball_config_2274_910000.pth --config=yolact_resnet101_player_ball_config --score_threshold=0.15 --top_k=15 --video_multiframe=1 --video=inp_vid.mp4:out_vid.mp4

However, 1) mAP (all) is not improving above 17 for box and 22 for mask. 2) The segmentation results are not smooth like when I get them using original yolact_im700_54_800000.pth. It was expected to improve with the customisation. It appears like trained on 550px images giving rough edges. 3) The players are identified as 'ball' and ball is not detected at all.

I seem to follow all the steps and solutions from previous issues. Still, I am not able to get good results. Is there something I am missing?

Any help would be highly appreciated!!!

PareshKamble commented 4 years ago

@dbolya @abhigoku10 Any suggestion / solution for the above problem?

abhigoku10 commented 4 years ago

@PareshKamble just wanted to have few more details

  1. what is your training set size 2.what is nature of loss from first epoch
PareshKamble commented 4 years ago

Hi @abhigoku10 1) I have 2000 training images and 850 validation images. 2) Loss reduces slightly and then more or less remains constant.

I realised a mistake in my json files. Train and val json files have class ids exchanged. I corrected them now and fine-tuning again.

PareshKamble commented 4 years ago

Hi @dbolya @abhigoku10 Even after rectifying the class id issue, I am not able to get good segmentation on players and ball. The problem still persist. The mAP is not improving beyond 17 for box and 22 for mask. Kindly suggest a solution for this issue. Thanking you in anticipation!

abhigoku10 commented 4 years ago

@PareshKamble can you please recheck your annotations part have you annotate the objects clearly since yolact gives good output for 500 frames only . can you share some results of false detections or not proper detections

PareshKamble commented 4 years ago

@abhigoku10 Here are a few results from recently fine-tuned Yolact. I masked the background for better visibility and it shows only the segmented regions. Screenshot from 2020-06-18 13-44-12 Screenshot from 2020-06-18 13-44-31 Screenshot from 2020-06-18 13-44-39 Screenshot from 2020-06-18 13-44-48

PareshKamble commented 4 years ago

For the above model, I got the following mAP values before interrupting at 20,000 iterations:

Calculating mAP...

       |  all  |  .50  |  .55  |  .60  |  .65  |  .70  |  .75  |  .80  |  .85  |  .90  |  .95  |
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
   box | 17.91 | 22.94 | 22.88 | 22.85 | 22.85 | 22.42 | 22.37 | 21.02 | 15.67 |  5.91 |  0.14 |
  mask | 20.62 | 23.23 | 23.23 | 22.94 | 22.94 | 22.94 | 22.91 | 22.82 | 22.39 | 18.99 |  3.77 |
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+

Now, I increased the dataset to 4000 training and 1714 validation images, lr=1e-5 and getting mAP values around the values shown below from beginning till current 18,000 iterations:

       |  all  |  .50  |  .55  |  .60  |  .65  |  .70  |  .75  |  .80  |  .85  |  .90  |  .95  |
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
   box | 42.85 | 48.87 | 48.81 | 48.81 | 48.81 | 48.81 | 48.44 | 48.36 | 47.29 | 36.58 |  3.69 |
  mask | 44.10 | 48.87 | 48.87 | 48.81 | 48.81 | 48.81 | 48.81 | 48.43 | 47.87 | 43.54 |  8.22 |
-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
abhigoku10 commented 4 years ago

@PareshKamble the learning rate is too low so u need to allow for more iterations , allow the model to learn further

PareshKamble commented 4 years ago

@abhigoku10 do you mean I need to keep the lr as it is and train longer?

PareshKamble commented 4 years ago

Hi @dbolya @abhigoku10 I generated a bigger dataset with 1024x1024 sized 4000 training images and 1714 validation images containing player and ball classes.

Training json file contains class information like "categories": [{"supercategory": "person", "id": 1, "name": "player"}, {"supercategory": "sports_ball", "id": 2, "name": "ball"}]} whereas the validation json file contains class information like "categories": [{"supercategory": "sports_ball", "id": 1, "name": "ball"}, {"supercategory": "person", "id": 2, "name": "player"}]} Note: We can see the class_ids for training and validation are exchanged.

In config.py I added

player_ball_dataset = dataset_base.copy({
  'name': 'player_ball_dataset',

  'train_info': '/home/paresh/Documents/output/train/coco_instances.json',
  'train_images': '/home/paresh/Documents/output/train/images/',

  'valid_info': '/home/paresh/Documents/output/val/coco_instances.json',
  'valid_images': '/home/paresh/Documents/output/val/images/',

  'class_names': ('ball', 'player'),
  'label_map': { 1:  1,  2:  2}
})

and

yolact_resnet101_player_ball_config = yolact_im700_config.copy({
    'name': 'yolact_resnet101_player_ball_config',
    # Dataset stuff
    'dataset': player_ball_dataset,
    'num_classes': len(player_ball_dataset.class_names) + 1,

    # Image Size
    'max_size': 700,
})

In yolact.py, I replaced self.load_state_dict(state_dict) with

try:
    self.load_state_dict(state_dict)
except RuntimeError as e:
    print('Ignoring "' + str(e) + '"')

and also replaced p = pred_layer(pred_x) with p = pred_layer(pred_x.detach())

I fine tuned with this command: python train.py --config=yolact_resnet101_player_ball_config --resume=weights/yolact_im700_54_800000.pth --start_iter=0 --batch_size=5

I left the process for ~24 hours (1,80,000 iterations) and interrupted with Ctrl+C

Later, I tested the fine-tuned model with: python eval.py --trained_model=weights/yolact_resnet101_player_ball_224_180000.pth --config=yolact_resnet101_player_ball_config --score_threshold=0.15 --top_k=20 --video_multiframe=1 --video=inp_vid.mp4:out_vid.mp4

However,

1) mAP (all) is not improving above 18 for box and 22 for mask. 2) The segmentation results are still not smooth like when I get them using original yolact_im700_54_800000.pth. It was expected to improve with the customisation. However, It appears like trained on 550px images giving wave-like edges when tested on 1080p video frames. 3) Ball class is not detected at all. 4) Does the exchange of class ids in training and validation json files have created an issue? However, previously I trained yolact_im700 with same class ids and it did not improve the results. 5) I seem to follow all the steps and solutions from previous issues.

Still, I am not able to get good results. Is there something I am missing?

Any help would be highly appreciated!!!

PareshKamble commented 4 years ago

@dbolya @abhigoku10 I generated a new dataset with player and ball classes of 400 train and 66 validation images using CVAT polygons. However, now only the ball is being detected / segmented and not the players. Earlier, I was getting the segmented masks of only the players. Can you please suggest me where the issue might be? Thanking you in anticipation!

abhigoku10 commented 4 years ago

@PareshKamble from previous comments i feel you need to look in to ur annotation files for both training and validaion set , it is very contradicting since yolact/yolact++ r good for any kind of masks besides rotated one . Even if you keep 1080p image the network resizes the data for training . The map not going above 18 is wokay but classes not getting detected i think there is an issue can you just check training only one class and then two classes

PareshKamble commented 4 years ago

@abhigoku10 Initially, I too suspected the annotation files for this issue. Previously, I used synthetic dataset as suggested in here and this time generated the annotations using CVAT. Had it been the case of annotation files for the first method, it would have not caused those done with CVAT. I shall try your suggestion of training one class first and then two. However, I suspect, I missed something in config file or somewhere else.

NVukobrat commented 3 years ago

Just bumping this one. Did you @PareshKamble or anyone related to this thread managed to determine what was causing this wave edge effect on your samples?