facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.54k stars 7.48k forks source link

How to train the PointRend on a local dataset? #1017

Closed yijingru closed 4 years ago

yijingru commented 4 years ago

How to train the PointRend on a local dataset?

ppwwyyxx commented 4 years ago

https://detectron2.readthedocs.io/tutorials/datasets.html

Cimino023 commented 4 years ago

Everything stated on https://detectron2.readthedocs.io/tutorials/datasets.html is quite clear for how to register a dataset. The colab tutorial for training on a custom dataset is straightforward too. No problem in training on custom datasets for other detectron2 models, but pointrend does not want to train! I have checked the colab tutorial for pointrend too, yet there must be something I did not get since no matter how hard I try my colab attempt of training pointrend fails all the time. I believe it has to do with pointrend cfg settings. Would you be so kind and provide us an example of those settings? Here's mine:

cfg = get_cfg()
point_rend.add_pointrend_config(cfg)
cfg.merge_from_file("detectron2_repo/projects/PointRend/configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco.yaml")
cfg.DATASETS.TRAIN = ("dataset_train",)

cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS =  "detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_3c3198.pkl" 
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.00025

cfg.SOLVER.MAX_ITER = 300

cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 2 #nr classes + 1
cfg.MODEL.POINT_HEAD.NUM_CLASSES = 2
Cimino023 commented 4 years ago

Update: while reading my own comment I saw this:

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 2 #nr classes + 1
cfg.MODEL.POINT_HEAD.NUM_CLASSES = 2

Which is obviously the mistake. I was so focused to find the problem elsewhere that my eyes never noticed that.

I finally trained pointrend on a custom dataset :')

Jain-Archit commented 3 years ago

Hi, Can you please point out the mistake? While training the model on point rend cfg and weights, I am getting an error : RuntimeError: grid_sampler(): expected input and grid to have same dtype, but input has c10::Half and grid has float

The same cfg, dataset was trained successfully using the COCO-Segmentation MaskRCNN weights.

ppwwyyxx commented 3 years ago

If you're using AMP, PointRend was not tested against float16 and likely will need some extra work to support float16.

Jain-Archit commented 3 years ago

Yup, that seems to be the issue. Thanks for the clarification.

Jain-Archit commented 3 years ago

Hi, Wanted to ask whether point-rend is currently supported only for default trainer? I wrote a custom trainer for custom transformation of input data which trains perfectly fine in detectron2. However, when I try to train point-rend with the custom trainer (extended default trainer), it throws an error regarding input polygon mask (even though it was set to bitmask in cfg file). Point-rend model trains perfectly fine with the same code when using the default trainer. Does point-rend support custom trainers? if yes, what am I doing wrong?

Code for Custom Trainer

class customTrainer(DefaultTrainer):
    @classmethod
    def build_test_loader(cls, cfg, dataset_name):
        return build_detection_test_loader(cfg, dataset_name, mapper=DatasetMapper(cfg, False))

    @classmethod
    def build_train_loader(cls, cfg):
        return build_detection_train_loader(cfg, mapper=custom_datasetMapper)

cfg = get_cfg()

Code for Cfg File

point_rend.add_pointrend_config(cfg)

cfg.INPUT.MAX_SIZE_TRAIN = 1440,
cfg.INPUT.MIN_SIZE_TRAIN = (800,),
cfg.INPUT.MASK_FORMAT= 'bitmask'
cfg.merge_from_file("detectron2_repo/projects/PointRend/configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco.yaml")
cfg.DATASETS.TRAIN = ("cpd_final",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = "detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_edd263.pkl"

cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(cpd_metadata.thing_classes)
cfg.MODEL.POINT_HEAD.NUM_CLASSES = len(cpd_metadata.thing_classes)
cfg.OUTPUT_DIR = './cpd'

cfg.SOLVER.MAX_ITER = 4000
cfg.SOLVER.STEPS = (2000,)
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.0025
cfg.SOLVER.WARMUP_ITERS = 500

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

Code for Custom Training which throws error

trainer = customTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

Code for Training which works fine

trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

Error which is thrown when using custom trainer

ERROR [07/23 09:18:31 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 497, in run_step self._trainer.run_step() File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 273, in run_step loss_dict = self.model(data) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/modeling/metaarch/rcnn.py", line 163, in forward , detector_losses = self.roi_heads(images, features, proposals, gt_instances) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 735, in forward losses.update(self._forward_mask(features, proposals)) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/modeling/roi_heads/roi_heads.py", line 838, in _forward_mask return self.mask_head(features, instances) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/projects/point_rend/mask_head.py", line 233, in forward point_coords, point_labels = self._sample_train_points(coarse_mask, instances) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/projects/point_rend/mask_head.py", line 285, in _sample_train_points point_labels = sample_point_labels(instances, point_coords_wrt_image) File "/opt/tljh/user/envs/pytorch/lib/python3.8/site-packages/detectron2/projects/point_rend/point_features.py", line 251, in sample_point_labels assert isinstance( AssertionError: Point head works with GT in 'bitmask' format. Set INPUT.MASK_FORMAT to 'bitmask'.

yasindagasan commented 3 years ago

have you solved it? having the same issue with custom trainer

MCSitar commented 3 years ago

@Jain-Archit @yasindagasan The issue that leads to this error is not with custom trainers, but with custom mappers. DefaultTrainer(cfg) calls the default mapper DatasetMapper(cfg, is_train=True), and DatasetMapper in turn (among other things) calls utils.annotations_to_instances(annos, image_shape, mask_format=cfg.INPUT.MASK_FORMAT) to convert segmentation masks into either BitMasks or PolygonMasks (see detectron2/structures) instances, depending on INPUT.MASK_FORMAT. PointRend requires BitMasks, but if DatasetMapper is not called in a custom trainer, masks are passed as polygons, the AssertionError is thrown, and changing INPUT.MASK_FORMAT does not do anything.

In my case, my custom mapper did not do anything important besides adding augmentations, so I was able to replace it with:

DatasetMapper(cfg, is_train=True, augmentations = custom_transform_list)

and was able to train a PointRend model using a custom trainer. If your custom mapper does do something important, you may be able to rewrite it so that it reformats your training data using utils.annotations_to_instances. Hope this is helpful!

anirbankonar123 commented 2 years ago

For me also same issue, the DefaultTrainer with pointrend works good, while using the Augmentation, I ran into issues, can you give the full solution, you have given : DatasetMapper(cfg, is_train=True, augmentations = custom_transform_list) What I have is - def custom_mapper_pointrend(dataset_dict): transform_list = [T.Resize((800,800)), T.RandomFlip(prob=0.5, horizontal=False, vertical=True), T.RandomFlip(prob=0.5, horizontal=True, vertical=False), ] mapper = DatasetMapper(cfg, is_train=True, augmentations=transform_list) return mapper

class CustomTrainerPointrend(DefaultTrainer): @classmethod def build_train_loader(cls, cfg): return build_detection_train_loader(cfg, mapper=custom_mapper_pointrend)

trainer = CustomTrainerPointrend(cfg) Rest of code is same - Now getting error - w, h = d["width"], d["height"]

TypeError: 'DatasetMapper' object is not subscriptable

MCSitar commented 2 years ago

@anirbankonar123 From the code that you provided, it seems as though your custom mapper only adds augmentations (this was also the case for my project!). If this is the case, you don't need to define a custom mapper at all - you can just define a custom transform list globally and pass it in args for build_detection_train_loader within your custom trainer. See below for my example:

custom_transform_list = [T.Resize((800,800)),
                         T.RandomFlip(prob=0.25, horizontal=False, vertical=True),
                         T.RandomFlip(prob=0.25, horizontal=True, vertical=False)]

class MyPointRendTrainer(DefaultTrainer):

    @classmethod
    def build_train_loader(cls, cfg):
        return build_detection_train_loader(cfg, mapper=
                                            DatasetMapper(cfg, is_train=True, recompute_boxes = True,
                                                          augmentations = custom_transform_list
                                                          ),
                                            )
    #Any other custom trainer methods here

Also, a possible source for your error: I believe that a custom mapper function should return a formatted dataset dict rather than another DatasetMapper instance - see the Dataloader tutorial.

anirbankonar123 commented 2 years ago

Thanks for the answer, it got resolved soon after, by similar code, as you hv shown. Its working ok now.

One more question : what is the fps obtained by pointrend frm detectron on real time video segmentation, do we hv a figure.

Thanks

On Thu, Jun 30, 2022, 22:53 MCSitar @.***> wrote:

@anirbankonar123 https://github.com/anirbankonar123 From the code that you provided, it seems as though your custom mapper only adds augmentations (this was also the case for my project!). If this is the case, you don't need to define a custom mapper at all - you can just define a custom transform list globally and pass it in args for build_detection_train_loader within your custom trainer. See below for my example:

custom_transform_list = [T.Resize((800,800)), T.RandomFlip(prob=0.25, horizontal=False, vertical=True), T.RandomFlip(prob=0.25, horizontal=True, vertical=False)]

class MyPointRendTrainer(DefaultTrainer):

@classmethod
def build_train_loader(cls, cfg):
    return build_detection_train_loader(cfg, mapper=
                                        DatasetMapper(cfg, is_train=True, recompute_boxes = True,
                                                      augmentations = custom_transform_list
                                                      ),
                                        )
#Any other custom trainer methods here

Also, a possible source for your error: I believe that a custom mapper function should return a formatted dataset dict rather than another DatasetMapper instance - see the Dataloader tutorial https://detectron2.readthedocs.io/en/latest/tutorials/data_loading.html.

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/detectron2/issues/1017#issuecomment-1171486443, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIW7N3FDCQBZIB2YAPVLKLVRXJXXANCNFSM4LEV2TCQ . You are receiving this because you were mentioned.Message ID: @.***>

MCSitar commented 2 years ago

Not sure about Detectron2 PointRend video FPS; I am not aware of any public real time D2 video segmentation implementations although somebody has probably(?) done it before. Pixelib seems to be optimized for real-time video segmentation with PointRend and might be worth checking out if Detectron2 processing speed is lacking...

anirbankonar123 commented 2 years ago

Thanks, thats true. Do we have sample custom code to train pixellib pointrend model. The sample shown on their site with Nature dataset does not seem to work properly.

MCSitar commented 2 years ago

@anirbankonar123 You are right that the PixelLib custom training demo is broken (it seems like a combination of dependency and Nature dataset integrity issues). The PixelLib PointRend video segmentation tutorial does still seem to work for me, and issues with the custom Mask-RCNN training demo might not be relevant because training custom PointRend models in PixelLib does not seem to be possible at all at present. The PointRend models implemented for PixelLib video segmentation are pre-trained Detectron2 PointRend models, so using custom-trained Detectron2 models in PixelLib seems viable. My very speculative idea for how you might be able to do this:

  1. Train and validate PointRend model(s) in Detectron2, save best final model .pkl weight files and config .yaml files.
  2. Fork PixelLib and modify its torchbackend __init__ (currently only supports loading COCO classes and default Detectron2 pretrained models) to fit your new model(s) and custom dataset classes.
  3. Identify and fix any PixelLib dependency issues that affect PointRend your video segmentation solution - given significant problems in the custom training demo, a good chance that these are present.

Unfortunately a more complicated solution (possibly more so than just using Detectron2) than an quick look at demos and documentation would suggest. Best of luck with your project!