Multiple Reproduction Issues

Hello, I have a few problems with reproducing your results which are probably due to my inexperience with pose estimation:

i) Your code seems to work and I can reproduce decent results for the demo folder. However, when I plug in some YCBV images instead and adapt the intrinsics accordingly to the UW camera values [1066.778, 0.0, 312.9869, 0.0, 1067.487, 241.3109, 0.0, 0.0, 1.0] the resulting images show poses that are significantly off. Should it work to just plug different images into the demo script and if so, do you have an idea why it might fail here and what I can do to fix it?

Here is an example of the clearly incorrect poses: 000001-color png_render

ii) There seems to be a second issue with the refinement, since it makes the objects almost vanish. Do the YCBV depth maps get misinterpreted or is this due to the division by 1000 when loading the depths or something else?

000001-color png_render_refined

iii) Since both obviously failed for data that is not the demo data, I concluded that I would have to train the model myself. Is that really necessary and if so, how can I make sure I don't run out of memory? I have two 1080 Ti, but still ran into "CUDA out of memory". I run this in Nvidia Docker with parameters --gpus all --ipc=host. Here is the full stack trace:

RuntimeError: CUDA out of memory. Tried to allocate 621.75 MiB (GPU 0; 10.91 GiB total capacity; 4.89 GiB already allocated; 190.25 MiB free; 794.07 MiB cached) (malloc at /pytorch/aten/src/THC/THCCachingAllocator.cpp:231)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f6c896e3fe1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f6c896e3dfa in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x13cf9c5 (0x7f6c07a509c5 in /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2_gpu.so)
frame #3: <unknown function> + 0x13d077a (0x7f6c07a5177a in /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::empty_cuda(c10::ArrayRef<long>, at::TensorOptions const&) + 0x443 (0x7f6c08be3a43 in /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2_gpu.so)
frame #5: at::CUDAFloatType::empty(c10::ArrayRef<long>, at::TensorOptions const&) const + 0x161 (0x7f6c0796a531 in /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2_gpu.so)
frame #6: torch::autograd::VariableType::empty(c10::ArrayRef<long>, at::TensorOptions const&) const + 0x179 (0x7f6bfc6bcdf9 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch.so.1)
frame #7: at::native::zeros(c10::ArrayRef<long>, at::TensorOptions const&) + 0x40 (0x7f6bfd981af0 in /usr/local/lib/python3.6/dist-packages/torch/lib/libcaffe2.so)
frame #8: <unknown function> + 0x10f7f4 (0x7f6bd03567f4 in /usr/local/lib/python3.6/dist-packages/posecnn-0.0.0-py3.6-linux-x86_64.egg/posecnn_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #9: pml_cuda_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, float) + 0x395 (0x7f6bd0356c52 in /usr/local/lib/python3.6/dist-packages/posecnn-0.0.0-py3.6-linux-x86_64.egg/posecnn_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #10: pml_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, float) + 0x1ce (0x7f6bd0332e3e in /usr/local/lib/python3.6/dist-packages/posecnn-0.0.0-py3.6-linux-x86_64.egg/posecnn_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #11: <unknown function> + 0xfaec5 (0x7f6bd0341ec5 in /usr/local/lib/python3.6/dist-packages/posecnn-0.0.0-py3.6-linux-x86_64.egg/posecnn_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #12: <unknown function> + 0xf7a87 (0x7f6bd033ea87 in /usr/local/lib/python3.6/dist-packages/posecnn-0.0.0-py3.6-linux-x86_64.egg/posecnn_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #13: python3() [0x50a5a5]
<omitting python frames>
frame #15: python3() [0x507cd4]
frame #16: python3() [0x58931b]
frame #18: THPFunction_apply(_object*, _object*) + 0x581 (0x7f6c99d004d1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #19: python3() [0x50a22f]
frame #22: python3() [0x595221]
frame #25: python3() [0x507cd4]
frame #27: python3() [0x595221]
frame #28: python3() [0x54a621]
frame #30: python3() [0x50a533]
frame #33: python3() [0x595221]
frame #36: python3() [0x507cd4]
frame #38: python3() [0x595221]
frame #39: python3() [0x54a621]
frame #42: python3() [0x507cd4]
frame #43: python3() [0x5893da]
frame #46: python3() [0x5096c8]
frame #47: python3() [0x50a3fd]
frame #49: python3() [0x5096c8]
frame #50: python3() [0x50a3fd]
frame #53: python3() [0x595221]
frame #55: python3() [0x5e1b32]
frame #56: python3() [0x631f94]
frame #57: <unknown function> + 0x76db (0x7f6c9ee1f6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #58: clone + 0x3f (0x7f6c9f15871f in /lib/x86_64-linux-gnu/libc.so.6)

iv) I also noticed that after slightly over 1000 steps the YCBV evaluation will break for me (I skipped training and relied on your checkpoints). It says:

[1083/40000], batch time 9.58
./experiments/scripts/ycb_object_test.sh: line 13:   562 Killed                  ./tools/test_net.py --gpu $1 --network posecnn --pretrained output/ycb_object/ycb_object_train/vgg16_ycb_object_epoch_$2.checkpoint.pth --dataset ycb_object_test --cfg experiments/cfgs/ycb_object.yml

I ran it via

mkdir -p output/ycb_object/ycb_object_train/
cp data/checkpoints/ycb_object/vgg16_ycb_object_epoch_16.checkpoint.pth output/ycb_object/ycb_object_train/
./experiments/scripts/ycb_object_test.sh 0 16

I would greatly appreciated any help - ultimately, any solution that lets me properly reproduce your results on arbitrary RGB-D image pairs with intrinsics or at least YCB would solve my main issue. Thank you very much in advance.

NVlabs / PoseCNN-PyTorch

Multiple Reproduction Issues #13