chongruo / detectron2-ResNeSt

A fork of Detectron2 with ResNeSt backbone
https://arxiv.org/abs/2004.08955
Apache License 2.0
385 stars 73 forks source link

How to fine-tune on custom data? #30

Open fcakyon opened 4 years ago

fcakyon commented 4 years ago

Hello, how do I fine-tune one of your pretrained models with my custom data, is it similar to fine-tuning a detectron2 model?

jhonygiraldo commented 4 years ago

Since this repo is based on detectron2, I think you have to follow this tutorial: https://detectron2.readthedocs.io/tutorials/datasets.html

jhonygiraldo commented 4 years ago

I tried the tutorial of detectron2 with this repository and the model cfg.merge_from_file("configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") and it works.

However when I try with cfg.merge_from_file("configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x.yaml") (that is the contribution of this work) it does not work and I get the following error message, and I am not sure what is the reason:

ERROR [07/16 08:35:35 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/content/detectron2_repo/detectron2/engine/train_loop.py", line 132, in train self.run_step() File "/content/detectron2_repo/detectron2/engine/train_loop.py", line 209, in run_step data = next(self._data_loader_iter) File "/content/detectron2_repo/detectron2/data/common.py", line 140, in iter for d in self.dataset: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) NotImplementedError: Caught NotImplementedError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/detectron2_repo/detectron2/data/common.py", line 41, in getitem data = self._map_func(self._dataset[cur_idx]) File "/content/detectron2_repo/detectron2/utils/serialize.py", line 23, in call return self._obj(*args, **kwargs) File "/content/detectron2_repo/detectron2/data/dataset_mapper.py", line 138, in call instances.gt_boxes = instances.gt_masks.get_bounding_boxes() File "/content/detectron2_repo/detectron2/structures/masks.py", line 203, in get_bounding_boxes raise NotImplementedError NotImplementedError

[07/16 08:35:35 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)

NotImplementedError Traceback (most recent call last)

in () 20 trainer = DefaultTrainer(cfg) 21 trainer.resume_or_load(resume=False) ---> 22 trainer.train() 7 frames /usr/local/lib/python3.6/dist-packages/torch/_utils.py in reraise(self) 393 # (https://bugs.python.org/issue2651), so we work around it. 394 msg = KeyErrorMessage(msg) --> 395 raise self.exc_type(msg) NotImplementedError: Caught NotImplementedError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/detectron2_repo/detectron2/data/common.py", line 41, in __getitem__ data = self._map_func(self._dataset[cur_idx]) File "/content/detectron2_repo/detectron2/utils/serialize.py", line 23, in __call__ return self._obj(*args, **kwargs) File "/content/detectron2_repo/detectron2/data/dataset_mapper.py", line 138, in __call__ instances.gt_boxes = instances.gt_masks.get_bounding_boxes() File "/content/detectron2_repo/detectron2/structures/masks.py", line 203, in get_bounding_boxes raise NotImplementedError NotImplementedError
chongruo commented 4 years ago

We didn't modify the dataloder

jhonygiraldo commented 4 years ago

It is a problem of memory, I tried with "mask_cascade_rcnn_ResNeSt_50_FPN_syncBN_1x" and it worked. I guess "mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x" is a heavy model and it did not fit in my machine. As I said before, to train in custom data you just have to follow the detectron2 tutorial https://detectron2.readthedocs.io/tutorials/datasets.html. @chongruo I have a code in colab, let me know if you want I can share it with you. Regards.

jaideep11061982 commented 2 years ago

@jhonygiraldo could you share one..

jaideep11061982 commented 2 years ago

I no where find any raise statement throwing NotImplementedError ,in latest version they do have such