RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. #48

Closed morrolinux closed 2 years ago

morrolinux commented 2 years ago

Hi, I was able to run the training fine until the other day I had to re-install everything. I've followed the instructions as always but when I run:

python ./src/ --data_dir ./kitti_format --exp_id KM3D_dla34 --arch dla_34 --batch_size 4 --master_batch_size 2 --lr 1.25e-4 --gpus 0 --num_epochs 120

I get:

RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.

Full log:

Traceback (most recent call last):
  File "./src/", line 111, in <module>
  File "./src/", line 73, in main
    log_dict_train, _ = trainer.train(epoch, train_loader)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/", line 162, in train
    return self.run_epoch('train', epoch, data_loader,unlabel_loader1,unlabel_loader2,unlabel_set,iter_num,uncert)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/", line 97, in run_epoch
    output, loss, loss_stats = model_with_loss(batch,phase=phase)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/", line 33, in forward
    loss, loss_stats = self.loss(outputs, batch,phase)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/morro/RTM3D_MORRO/src/lib/trains/", line 52, in forward
    coor_loss, prob_loss, box_score = self.position_loss(output, batch,phase)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/morro/RTM3D_MORRO/src/lib/models/", line 444, in forward
    dim_mask_score_mask = 1 - (dim_mask_score_mask > 0)
  File "/home/morro/anaconda3/envs/CenterNet/lib/python3.6/site-packages/torch/", line 325, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `bitwise_not()` operator instead.

The code was working fine until re-install, so I'm guessing it could be caused by some new library version (but not the obvious ones since they are fixed in version as per instructions)

cuda100                   1.0                           0    pytorch
pytorch                   1.0.0           py3.6_cuda10.0.130_cudnn7.4.1_1  [cuda100]  pytorch
torchvision               0.2.1                      py_2    pytorch

Does anyone have any clue on what's going on here?

morrolinux commented 2 years ago

Here's my full environment:

perhaps you can share yours so I can double-check the versions for each seemingly relevant package?

morrolinux commented 2 years ago

Apparently I've messed up the environment because of a conda/pip incompatibility. Conda was always pointing to the pytorch installation I made with pip which is the wrong version.

pip uninstall pytorch torchvision then removing and reinstalling those two with conda (conda install pytorch==1.0.0 torchvision==0.2.1 cuda100 -c pytorch) fixed it.

Also keep in mind you should always re-build DCNv2 and iou3d when switching pytorch versions or you'll get runtime linking issues due to unknown symbols.