Banconxuan / RTM3D

The official PyTorch Implementation of RTM3D and KM3D for Monocular 3D Object Detection
MIT License
454 stars 85 forks source link

RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. #48

Closed morrolinux closed 2 years ago

morrolinux commented 2 years ago

Hi, I was able to run the training fine until the other day I had to re-install everything. I've followed the instructions as always but when I run:

python ./src/main.py --data_dir ./kitti_format --exp_id KM3D_dla34 --arch dla_34 --batch_size 4 --master_batch_size 2 --lr 1.25e-4 --gpus 0 --num_epochs 120

I get:

RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.

Full log:

Traceback (most recent call last):
  File "./src/main.py", line 111, in <module>
    main(opt)
  File "./src/main.py", line 73, in main
    log_dict_train, _ = trainer.train(epoch, train_loader)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/base_trainer.py", line 162, in train
    return self.run_epoch('train', epoch, data_loader,unlabel_loader1,unlabel_loader2,unlabel_set,iter_num,uncert)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/base_trainer.py", line 97, in run_epoch
    output, loss, loss_stats = model_with_loss(batch,phase=phase)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/base_trainer.py", line 33, in forward
    loss, loss_stats = self.loss(outputs, batch,phase)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/car_pose.py", line 52, in forward
    coor_loss, prob_loss, box_score = self.position_loss(output, batch,phase)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/morro/RTM3D_MORRO/src/lib/models/losses.py", line 444, in forward
    dim_mask_score_mask = 1 - (dim_mask_score_mask > 0)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/tensor.py", line 325, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.

The code was working fine until re-install, so I'm guessing it could be caused by some new library version (but not the obvious ones since they are fixed in version as per instructions)

cuda100                   1.0                           0    pytorch
pytorch                   1.0.0           py3.6_cuda10.0.130_cudnn7.4.1_1  [cuda100]  pytorch
torchvision               0.2.1                      py_2    pytorch

Does anyone have any clue on what's going on here?

morrolinux commented 2 years ago

Here's my full environment:

# packages in environment at /home/morro/anaconda3/envs/KM3D:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             4.5                       1_gnu  
absl-py                   0.14.1                   pypi_0    pypi
albumentations            1.1.0                    pypi_0    pypi
blas                      1.0                         mkl  
ca-certificates           2021.7.5             h06a4308_1  
cachetools                4.2.4                    pypi_0    pypi
certifi                   2021.5.30        py36h06a4308_0  
cffi                      1.14.5           py36h261ae71_0  
charset-normalizer        2.0.6                    pypi_0    pypi
coco                      0.0.0                    pypi_0    pypi
cuda100                   1.0                           0    pytorch
cycler                    0.10.0                   pypi_0    pypi
cython                    0.29.23                  pypi_0    pypi
dataclasses               0.8                      pypi_0    pypi
dcnv2                     0.1                       dev_0    <develop>
decorator                 4.4.2                    pypi_0    pypi
easydict                  1.9                      pypi_0    pypi
fire                      0.4.0                    pypi_0    pypi
freetype                  2.10.4               h5ab3b9f_0  
google-auth               1.35.0                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
grpcio                    1.41.0                   pypi_0    pypi
idna                      3.2                      pypi_0    pypi
imageio                   2.9.0                    pypi_0    pypi
importlib-metadata        4.8.1                    pypi_0    pypi
intel-openmp              2021.2.0           h06a4308_610  
iou3d                     0.0.0                    pypi_0    pypi
joblib                    1.0.1                    pypi_0    pypi
jpeg                      9b                   h024ee3a_2  
kiwisolver                1.3.1                    pypi_0    pypi
lcms2                     2.12                 h3be6417_0  
ld_impl_linux-64          2.35.1               h7274673_9  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.3.0               h5101ec6_17  
libgomp                   9.3.0               h5101ec6_17  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.3.0               hd4cf53a_17  
libtiff                   4.2.0                h85742a9_0  
libwebp-base              1.2.0                h27cfd23_0  
llvmlite                  0.36.0                   pypi_0    pypi
lz4-c                     1.9.3                h2531618_0  
markdown                  3.3.4                    pypi_0    pypi
matplotlib                3.3.4                    pypi_0    pypi
mkl                       2020.2                      256    anaconda
mkl-service               2.3.0            py36he8ac12f_0  
mkl_fft                   1.3.0            py36h54f3939_0  
mkl_random                1.1.1            py36h0573a6f_0  
ncurses                   6.2                  he6710b0_1  
networkx                  2.5.1                    pypi_0    pypi
ninja                     1.10.2               hff7bd54_1  
numba                     0.53.1                   pypi_0    pypi
numpy                     1.19.2           py36h54aff64_0  
numpy-base                1.19.2           py36hfa32c7d_0  
oauthlib                  3.1.1                    pypi_0    pypi
olefile                   0.46                     py36_0  
opencv-python             4.0.0.21                 pypi_0    pypi
opencv-python-headless    4.5.3.56                 pypi_0    pypi
openssl                   1.1.1l               h7f8727e_0  
pillow                    8.2.0            py36he98fc37_0  
pip                       21.1.3           py36h06a4308_0  
progress                  1.5                      pypi_0    pypi
protobuf                  3.17.3                   pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycocotools               2.0.2                    pypi_0    pypi
pycparser                 2.20                       py_2  
pyparsing                 2.4.7                    pypi_0    pypi
python                    3.6.13               h12debd9_1  
python-dateutil           2.8.1                    pypi_0    pypi
pytorch                   1.0.0           py3.6_cuda10.0.130_cudnn7.4.1_1  [cuda100]  pytorch
pywavelets                1.1.1                    pypi_0    pypi
pyyaml                    5.4.1                    pypi_0    pypi
qudida                    0.0.4                    pypi_0    pypi
readline                  8.1                  h27cfd23_0  
requests                  2.26.0                   pypi_0    pypi
requests-oauthlib         1.3.0                    pypi_0    pypi
rsa                       4.7.2                    pypi_0    pypi
scikit-image              0.17.2                   pypi_0    pypi
scikit-learn              0.24.2                   pypi_0    pypi
scipy                     1.5.4                    pypi_0    pypi
setuptools                52.0.0           py36h06a4308_0  
shapely                   1.7.1                    pypi_0    pypi
six                       1.16.0             pyhd3eb1b0_0  
sqlite                    3.36.0               hc218d9a_0  
tensorboard               2.6.0                    pypi_0    pypi
tensorboard-data-server   0.6.1                    pypi_0    pypi
tensorboard-plugin-wit    1.8.0                    pypi_0    pypi
tensorboardx              2.4                      pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
threadpoolctl             3.0.0                    pypi_0    pypi
tifffile                  2020.9.3                 pypi_0    pypi
tk                        8.6.10               hbc83047_0  
torchvision               0.2.1                      py_2    pytorch
typing-extensions         3.10.0.2                 pypi_0    pypi
urllib3                   1.26.7                   pypi_0    pypi
werkzeug                  2.0.1                    pypi_0    pypi
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h7b6447c_0  
zipp                      3.6.0                    pypi_0    pypi
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.9                haebb681_0  

perhaps you can share yours so I can double-check the versions for each seemingly relevant package?

morrolinux commented 2 years ago

Apparently I've messed up the environment because of a conda/pip incompatibility. Conda was always pointing to the pytorch installation I made with pip which is the wrong version.

pip uninstall pytorch torchvision then removing and reinstalling those two with conda (conda install pytorch==1.0.0 torchvision==0.2.1 cuda100 -c pytorch) fixed it.

Also keep in mind you should always re-build DCNv2 and iou3d when switching pytorch versions or you'll get runtime linking issues due to unknown symbols.