facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
29.32k stars 7.32k forks source link

Detectron2 keypoints Rcnn Training using VertexAI custom Training Job #5196

Closed RiccardoMaistri closed 5 months ago

RiccardoMaistri commented 5 months ago

I am having issue trying to start a training of keypointsRcnn using detectron2 framework (exploiting the custom training job with vertex)

I forked the detectron2-train-docker-image provided by Vertex and added the support for keypoints Rcnn, the addition regard a few files and cfg of detectron2 (regarding keypoints).

The thing that blows my mind is that if I run the code locally, everything works fine. The dataset contains two images with three keypoints each. The cfg added are simply:

MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS=3

TEST.KEYPOINT_OKS_SIGMAS"

and keypoint_names and keypoint_flip_map in dataset Metadata

If i run using container docker deployement the traceback error is this:


   File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
     "__main__", mod_spec)
   File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
     exec(code, run_globals)
   File "/home/appuser/trainer/task.py", line 295, in <module>
     args=(args,),
   File "/home/appuser/detectron2_repo/detectron2/engine/launch.py", line 82, in launch
     main_func(*args)
   File "/home/appuser/trainer/task.py", line 279, in main
     trainer.train()
   File "/home/appuser/detectron2_repo/detectron2/engine/defaults.py", line 484, in train
     super().train(self.start_iter, self.max_iter)
   File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 149, in train
     self.run_step()
   File "/home/appuser/detectron2_repo/detectron2/engine/defaults.py", line 494, in run_step
     self._trainer.run_step()
   File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 267, in run_step
     data = next(self._data_loader_iter)
   File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 234, in __iter__
     for d in self.dataset:
   File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
     data = self._next_data()
   File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
     return self._process_data(data)
   File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
     data.reraise()
   File "/home/appuser/.local/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
     raise exception
 ValueError: Caught ValueError in DataLoader worker process 1.
 Original Traceback (most recent call last):
   File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
     data = fetcher.fetch(index)
   File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
     data.append(next(self.dataset_iter))
   File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 201, in __iter__
     yield self.dataset[idx]
 File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 90, in __getitem__
   data = self._map_func(self._dataset[cur_idx])
 File "/home/appuser/detectron2_repo/detectron2/utils/serialize.py", line 26, in __call__
   return self._obj(*args, **kwargs)
 File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 189, in __call__
   self._transform_annotations(dataset_dict, transforms, image_shape)
 File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 128, in _transform_annotations
   for obj in dataset_dict.pop("annotations")
 File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 129, in <listcomp>
   if obj.get("iscrowd", 0) == 0
 File "/home/appuser/detectron2_repo/detectron2/data/detection_utils.py", line 314, in transform_instance_annotations
   annotation["keypoints"], transforms, image_size, keypoint_hflip_indices
 File "/home/appuser/detectron2_repo/detectron2/data/detection_utils.py", line 360, in transform_keypoint_annotations
 "contains {} points!".format(len(keypoints), 
 ValueError: Keypoint data has 3 points, but metadata contains 15 points!

Specifications

github-actions[bot] commented 5 months ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs";