Closed isagastiberri closed 5 years ago
data_dict['sem_seg']
should be a torch.Tensor
, rather than an numpy array. I'll fix the docs.
Loading all sem_seg
to memory is a bad idea for memory. You can also use sem_seg_file_name
.
Thanks! I used sem_seg_file_name to load the semantic segmentation and it works now. I wasn't sure if you could load images as semantic segmentation or a json, that's why I loaded sem_seg directly but I see it works with images so thank you for your help!
Hello @isagastiberri ~ Do you have the following problems when training mapillary? I used your code list above.
Config './configs/COCO-PanopticSegmentation/panoptic_fpn_R_50_1x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
after trainer
'roi_heads.box_predictor.cls_score.weight' has shape (81, 1024) in the checkpoint but (68, 1024) in the model! Skipped.
'roi_heads.box_predictor.cls_score.bias' has shape (81,) in the checkpoint but (68,) in the model! Skipped.
'roi_heads.box_predictor.bbox_pred.weight' has shape (320, 1024) in the checkpoint but (268, 1024) in the model! Skipped.
'roi_heads.box_predictor.bbox_pred.bias' has shape (320,) in the checkpoint but (268,) in the model! Skipped.
'roi_heads.mask_head.predictor.weight' has shape (80, 256, 1, 1) in the checkpoint but (67, 256, 1, 1) in the model! Skipped.
'roi_heads.mask_head.predictor.bias' has shape (80,) in the checkpoint but (67,) in the model! Skipped.
after load
/opt/conda/conda-bld/pytorch_1570910687230/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [156,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1570910687230/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [157,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1570910687230/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [158,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1570910687230/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:104: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [159,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
File "/home/lin/PycharmProjects/detectron2/tools/train_mapillary_panoptic.py", line 49, in <module>
trainer.train()
File "/home/lin/PycharmProjects/detectron2/detectron2/engine/defaults.py", line 329, in train
super().train(self.start_iter, self.max_iter)
File "/home/lin/PycharmProjects/detectron2/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/lin/PycharmProjects/detectron2/detectron2/engine/train_loop.py", line 212, in run_step
loss_dict = self.model(data)
File "/home/lin/Software/anaconda3/envs/psnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/lin/PycharmProjects/detectron2/detectron2/modeling/meta_arch/panoptic_fpn.py", line 96, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/home/lin/Software/anaconda3/envs/psnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/lin/PycharmProjects/detectron2/detectron2/modeling/proposal_generator/rpn.py", line 143, in forward
anchors = self.anchor_generator(features)
File "/home/lin/Software/anaconda3/envs/psnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/lin/PycharmProjects/detectron2/detectron2/modeling/anchor_generator.py", line 181, in forward
anchors_over_all_feature_maps = self.grid_anchors(grid_sizes)
File "/home/lin/PycharmProjects/detectron2/detectron2/modeling/anchor_generator.py", line 124, in grid_anchors
shift_x, shift_y = _create_grid_offsets(size, stride, base_anchors.device)
File "/home/lin/PycharmProjects/detectron2/detectron2/modeling/anchor_generator.py", line 43, in _create_grid_offsets
shifts_x = torch.arange(0, grid_width * stride, step=stride, dtype=torch.float32, device=device)
RuntimeError: tabulate: failed to synchronize: cudaErrorAssert: device-side assert triggered
@EEEGUI yes! I have the same problem I was checking everything in my code first but I was about to open a new issue because I don't think that assertion should break the code.
Assertion t >= 0 && t < n_classes
is not correct, as t = n_classes should be the background class
How To Reproduce the Issue
git diff
) or what code you wrote I wrote my own code for registering the dataset following the balloon example and the docs, here is how I register it:def register_mapillary(root_dir): config_path = os.path.join(root_dir, 'config.json') data_name = 'mapillary-panoptic-'
read in config file
root_dir = './datasets/mapillary-vistas-panoptic/' register_mapillary(root_dir) cfg = get_cfg() cfg.merge_from_file("./configs/COCO-PanopticSegmentation/panoptic_fpn_R_50_1x.yaml") cfg.DATASETS.TRAIN = ("mapillary-panoptic-training",) cfg.DATASETS.TEST = () # no metrics implemented for this dataset cfg.DATALOADER.NUM_WORKERS = 1 cfg.MODEL.WEIGHTS = "models/model_final_panoptic.pkl" # initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 1 cfg.SOLVER.BASE_LR = 0.00025 cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough, but you can certainly train longer cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset cfg.MODEL.ROI_HEADS.NUM_CLASSES = 66 # only has one class (ballon)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) print('after trainer') trainer.resume_or_load(resume=False) print('after load') trainer.train() print('training finished')
Traceback (most recent call last): File "tools/train_panoptic.py", line 35, in
trainer.train()
File "/workspace/detectron2/detectron2/engine/defaults.py", line 350, in train
super().train(self.start_iter, self.max_iter)
File "/workspace/detectron2/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/workspace/detectron2/detectron2/engine/train_loop.py", line 212, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/workspace/detectron2/detectron2/modeling/meta_arch/panoptic_fpn.py", line 83, in forward
gt_sem_seg = [x["sem_seg"].to(self.device) for x in batched_inputs]
File "/workspace/detectron2/detectron2/modeling/meta_arch/panoptic_fpn.py", line 83, in
gt_sem_seg = [x["sem_seg"].to(self.device) for x in batched_inputs]
AttributeError: 'numpy.ndarray' object has no attribute 'to'
PyTorch built with: