chensong1995 / HybridPose

HybridPose: 6D Object Pose Estimation under Hybrid Representation (CVPR 2020)
MIT License
412 stars 64 forks source link

using torch 1.13, cuda 11.7, torchvision 0.14 and Python 3.10 leads to negative eigenvalues #87

Closed monajalal closed 9 months ago

monajalal commented 10 months ago

When I use the md5sum checked ape weight (199) I get these warnings (and NANs). Do you also get these warnings, negative eigenvalues, and NANs?

(hybridpose) mona@mona-ThinkStation-P7:~/HybridPose$ LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py --load_dir /home/mona/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199 --object_name ape

Screenshot from 2023-10-24 09-20-22

Screenshot from 2023-10-24 09-25-26

and finally: Screenshot from 2023-10-24 09-34-12

(hybridpose) mona@mona-ThinkStation-P7:~/HybridPose$ conda list
# packages in environment at /home/mona/anaconda3/envs/hybridpose:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
astropy                   5.3.4                    pypi_0    pypi
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2023.08.22           h06a4308_0  
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.3.0                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
contourpy                 1.1.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dask                      2023.10.0                pypi_0    pypi
fonttools                 4.43.1                   pypi_0    pypi
fsspec                    2023.9.2                 pypi_0    pypi
idna                      3.4                      pypi_0    pypi
imageio                   2.31.5                   pypi_0    pypi
importlib-metadata        6.8.0                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lazy-loader               0.3                      pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
locket                    1.0.0                    pypi_0    pypi
matplotlib                3.8.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  3.1                      pypi_0    pypi
numpy                     1.26.1                   pypi_0    pypi
opencv-python             4.8.1.78                 pypi_0    pypi
openssl                   3.0.11               h7f8727e_2  
packaging                 23.2                     pypi_0    pypi
partd                     1.4.1                    pypi_0    pypi
pillow                    10.1.0                   pypi_0    pypi
pip                       23.2.1          py310h06a4308_0  
platformdirs              3.11.0                   pypi_0    pypi
pooch                     1.7.0                    pypi_0    pypi
pyamg                     5.0.1                    pypi_0    pypi
pyerfa                    2.0.1                    pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
python                    3.10.13              h955ad1f_0  
python-dateutil           2.8.2                    pypi_0    pypi
pywavelets                1.4.1                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
requests                  2.31.0                   pypi_0    pypi
scikit-image              0.22.0                   pypi_0    pypi
scikit-learn              1.3.1                    pypi_0    pypi
scipy                     1.11.3                   pypi_0    pypi
setuptools                68.0.0          py310h06a4308_0  
simpleitk                 2.3.0                    pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
threadpoolctl             3.2.0                    pypi_0    pypi
tifffile                  2023.9.26                pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
toolz                     0.12.0                   pypi_0    pypi
torch                     1.13.0+cu117             pypi_0    pypi
torchtext                 0.14.0                   pypi_0    pypi
torchvision               0.14.0+cu117             pypi_0    pypi
tqdm                      4.66.1                   pypi_0    pypi
typing-extensions         4.8.0                    pypi_0    pypi
tzdata                    2023c                h04d1e81_0  
urllib3                   2.0.6                    pypi_0    pypi
wheel                     0.41.2          py310h06a4308_0  
xz                        5.4.2                h5eee18b_0  
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0  
(hybridpose) mona@mona-ThinkStation-P7:~/HybridPose$ python
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.13.0+cu117'
>>> import torchvision
>>> torchvision.__version__
'0.14.0+cu117'
>>> torch.version.cuda
'11.7'

Screenshot from 2023-10-24 09-32-29

Please note that after the test_set_ape.npy is saved in output folder, evaluate script results in 0 for ADD-S metric.

(hybridpose) mona@mona-ThinkStation-P7:~/HybridPose$ python src/evaluate.py
ADD(-S) score of initial prediction is: 0.0
ADD(-S) score of final prediction is: 0.0
(hybridpose) mona@mona-ThinkStation-P7:~/HybridPose$ ls output/linemod/test_set_ape.npy 
-rw-rw-r-- 1 mona mona 176K Oct 24 09:27 output/linemod/test_set_ape.npy

^^^ which potentially shows running train_core.py from trained_weights is not working as expected. Please let me know if you may have any solution?

chensong1995 commented 10 months ago

Hi Mona,

Thanks for your question! Can you run three iterations of trainer.test (here) and see if the visualizations look good to you?

monajalal commented 10 months ago

Hi Chen,

I didn't exactly followed your instruction. Could you please elaborate a bit more and show the command I need to run? Thanks, Mona

chensong1995 commented 10 months ago

Hi Mona,

Right before this line, add trainer.test(0).

Right after this line, add pdb.set_trace().

Run

LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py --save_dir /home/mona/HybridPose/saved_weights/linemod/ape --load_dir /home/mona/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199 --object_name ape

When you hit the breakpoint for the fourth time, use Ctrl+D to quit the program. Go to /home/mona/HybridPose/saved_weights/linemod/ape/image and inspect the visualizations.

I hope this helps! Let me know if you have further concerns.

monajalal commented 10 months ago

Thank you for your response. As for clarification, for hitting the breakpoint 4 times, the first time I run the command, it goes to pdb interactive , then I enter continue but it keeps running for a long time. Is it intended and do you expect to see something like this? I am still waiting to enter continue for a second time

image

(hybridpose) mona@ada:~/HybridPose$ LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py --save_dir /home/mona/HybridPose/saved_weights/linemod/ape --load_dir /home/mona/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199 --object_name ape
number of model parameters: 12959563
Successfully loaded model from /home/mona/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
Testing...
/home/mona/anaconda3/envs/hybridpose/lib/python3.10/site-packages/torch/nn/functional.py:1967: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/mona/HybridPose/lib/ransac_voting_gpu_layer/ransac_voting_gpu.py:546: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead. (Triggered internally at ../aten/src/ATen/native/IndexingUtils.h:27.)
  direct = vertex[bi].masked_select(torch.unsqueeze(torch.unsqueeze(cur_mask, 2), 3))  # [tn,vn,2]
Loss: 0.3184
> /home/mona/HybridPose/trainers/coretrainer.py(574)generate_data()
-> for i_batch, batch in enumerate(val_loader):
(Pdb)
monajalal commented 10 months ago

an update, I never got to enter continue in front of (pdb) 2 more times since the first time I entered it, it executed through the end.

image

do you know how to achieve what you suggested?

monajalal commented 10 months ago

That said, for the current saved one, when I browsed to the folder that you mentioned, I have these. Do you think they are acceptable? (hybridpose) mona@ada:~/HybridPose$ nautilus /home/mona/HybridPose/saved_weights/linemod/ape/image/0.001

Screenshot from 2023-11-03 18-07-05 Screenshot from 2023-11-03 18-07-09 Screenshot from 2023-11-03 18-07-14 Screenshot from 2023-11-03 18-07-18 Screenshot from 2023-11-03 18-07-25 Screenshot from 2023-11-03 18-07-29 Screenshot from 2023-11-03 18-07-35

chensong1995 commented 10 months ago

Hi Mona,

The images you showed are from the downloaded weight archive. The hope is that you should see very similar results from your run to these images.

The reason why the code takes so long to run is that the breakpoint you have is in generate_data() instead of test(). After running the code, the filenames of the newly generated visualizations should have the prefix 0_ because we are setting epoch to 0 when calling test(0).

monajalal commented 10 months ago

Thanks a lot for clarification. After pressing the continue button 4 times I do not see pts on the objects. As you see 0_2_pts.jpg has no points while the ground truth 0_2_pts_gt.jpg has points.

Screenshot from 2023-11-06 08-25-23 Screenshot from 2023-11-06 08-27-24 Screenshot from 2023-11-06 08-29-22 Screenshot from 2023-11-06 08-30-04 Screenshot from 2023-11-06 08-30-13 Screenshot from 2023-11-06 08-30-24 Screenshot from 2023-11-06 08-30-37

chensong1995 commented 10 months ago

Hi Mona,

Thanks for the follow-up! It looks to me that the keypoint voting procedure is causing the issue. To verify, you can take a look at the _vote_ images. My expectation is that the predicted votes are very similar to the ground-truth ones. This is probably due to an unsuccessful complication of the RANSAC voting layer.

I hope this helps! Let me know if you have further concerns.