YoungXIAO13 / PoseFromShape

(BMVC 2019) PyTorch implementation of Paper "Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects"
http://imagine.enpc.fr/~xiaoy/PoseFromShape/
MIT License
170 stars 34 forks source link

Different number of classes (bins) for angles error #8

Closed ajuric closed 5 years ago

ajuric commented 5 years ago

When trying different number of classes (bins) for angles, I get

/opt/conda/conda-bld/pytorch_1550802451070/work/aten/src/THCUNN/ClassNLLCriterion.cu:105: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "training.py", line 278, in <module>
    main()
  File "training.py", line 230, in main
    criterion_azi, criterion_ele, criterion_inp, criterion_reg, optimizer)
  File "training.py", line 187, in train
    loss_reg = criterion_reg(out[3], out[4], out[5], label.float())
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/code/auxiliary/loss.py", line 42, in forward
    return delta_loss(pred_azi, pred_ele, pred_rol, target, self.__bin__)
  File "/code/auxiliary/loss.py", line 28, in delta_loss
    delta_azi = pred_azi[torch.arange(pred_azi.size(0)), target_label[:, 0]].tanh() / 2
RuntimeError: CUDA error: device-side assert triggered

The error happens inside the DeltaLoss class in the delta_loss method in the line:

delta_azi = pred_azi[torch.arange(pred_azi.size(0)), target_label[:, 0]].tanh() / 2

(link to line: https://github.com/YoungXIAO13/PoseFromShape/blob/master/auxiliary/loss.py#L29)

I tried debugging, but wasn't successful. I mostly use TF, so maybe I'm missing something very common in Pytorch?

When I use default number of classes (azimuth=24, elevation=12, in-plane=24), training runs normally and completes.

ajuric commented 5 years ago

I found out the problem.

Number 180 is not divisible with 24 - the elevation angle has range set to 180: https://github.com/YoungXIAO13/PoseFromShape/blob/master/training.py#L124

So the solution is to set these two numbers (range and number of classes) to be divisible.

Anyway, pytorch error should have been more informative ...

YoungXIAO13 commented 5 years ago

Thanks for your notice, I've made a new commit to change all xxx_classes as a function of bin_size which is used to define the classification bin size for all Euler angles, where you only need to change the bin_size to conduct an ablation study.