Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI.
I am having issue trying to start a training of keypointsRcnn using detectron2 framework (exploiting the custom training job with vertex)
I forked the detectron2-train-docker-image and added the support for keypoints Rcnn, the addition regard a few files and cfg of detectron2 (regarding keypoints).
The thing that blow my mind is that if I run the code locally, everything works fine.
The dataset contains two images with three keypoints each.
The cfg added are simply:
["MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS"] + ["3"]
["TEST.KEYPOINT_OKS_SIGMAS"] + [str(sigmas)]
and keypoint_names and keypoint_flip_map in dataset Metadata
If i run using container docker deployement the traceback error is this:
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/appuser/trainer/task.py", line 295, in <module>
args=(args,),
File "/home/appuser/detectron2_repo/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "/home/appuser/trainer/task.py", line 279, in main
trainer.train()
File "/home/appuser/detectron2_repo/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/home/appuser/detectron2_repo/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/home/appuser/detectron2_repo/detectron2/engine/train_loop.py", line 267, in run_step
data = next(self._data_loader_iter)
File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 234, in __iter__
for d in self.dataset:
File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/appuser/.local/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/appuser/.local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 201, in __iter__
yield self.dataset[idx]
File "/home/appuser/detectron2_repo/detectron2/data/common.py", line 90, in __getitem__
data = self._map_func(self._dataset[cur_idx])
File "/home/appuser/detectron2_repo/detectron2/utils/serialize.py", line 26, in __call__
return self._obj(*args, **kwargs)
File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 189, in __call__
self._transform_annotations(dataset_dict, transforms, image_shape)
File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 128, in _transform_annotations
for obj in dataset_dict.pop("annotations")
File "/home/appuser/detectron2_repo/detectron2/data/dataset_mapper.py", line 129, in <listcomp>
if obj.get("iscrowd", 0) == 0
File "/home/appuser/detectron2_repo/detectron2/data/detection_utils.py", line 314, in transform_instance_annotations
annotation["keypoints"], transforms, image_size, keypoint_hflip_indices
File "/home/appuser/detectron2_repo/detectron2/data/detection_utils.py", line 360, in transform_keypoint_annotations
"contains {} points!".format(len(keypoints),
ValueError: Keypoint data has 3 points, but metadata contains 15 points!
I am having issue trying to start a training of keypointsRcnn using detectron2 framework (exploiting the custom training job with vertex)
I forked the detectron2-train-docker-image and added the support for keypoints Rcnn, the addition regard a few files and cfg of detectron2 (regarding keypoints).
The thing that blow my mind is that if I run the code locally, everything works fine. The dataset contains two images with three keypoints each. The cfg added are simply:
["MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS"] + ["3"]
["TEST.KEYPOINT_OKS_SIGMAS"] + [str(sigmas)]
and
keypoint_names
andkeypoint_flip_map
in dataset MetadataIf i run using container docker deployement the traceback error is this:
Specifications