Closed yxx623 closed 2 years ago
That' sounds like a data set up problem, can you provide the commands that you ran starting from the virtual environment and where you ran them from (e.g. root of the repository, or outside the repository, etc.)?
Specifically, can you provide which torch version you had (torch 1.7 seems to have something wrong with it, but 1.8 is okay in https://github.com/alexklwong/calibrated-backprojection-network/issues/7) and which dataset setup, train, or inference bash script did you run?
If this has to do with training, can you also list all the training settings you used and the number of GPUs etc.
Thanks for your prompt reply. The commands i ran is same as the code you provided, and i can it in the root of the repository. I built the virtual environment just like yours, the torch vision is 1.3.0. I have tied to run the pretrained model on KITTI validation set, test set, and train model on the KITTI dataset. All of them have indicated the same error. The storage of KITTI validation set is shown in the figure, and each folder has 1000 files. I haven't changed any setting, and used 2 GUPs. Thanks!
Strange, I've just clone a fresh copy of the repo, created the virtual environment, ran python setup/setup_dataset_kitti.py
and ran bash bash/kitti/run_kbnet_kitti_validation.sh
, but I didn't see the error.
In general the only spot that would use inverse should be the backprojection step https://github.com/alexklwong/calibrated-backprojection-network/blob/master/src/networks.py#L498 but this shouldn't throw an error because intrinsics matrix is invertible. That's why I think it is a data loading issue.
Can you provide the full stack trace just in case?
I am not sure how the inference script will work if you are using 2 GPUs since batch size is one regardless, but in general you just need one so export CUDA_VISIBLE_DEVICES=0
Training should work just fine with multiple GPUs, this user also used two GPUs https://github.com/alexklwong/calibrated-backprojection-network/issues/5#issuecomment-989638885
Hi all, I am facing this problem, too. torch 1.3.0 single GPU when training (train_kbnet_kitti.sh)
Perhaps meeting over Google meet or zoom would be easier to trouble shoot this. Would you mind sending me an email to alexw@cs.ucla.edu so that I may schedule 30 minutes to trouble shoot?
Hi @alexklwong, thank you so much for your quick support. I used clues from #7 and it worked for me. What I did:
install cuda==11.1
pip install torch==1.8.2+cu111 torchvision==0.9.2+cu111 torchaudio==0.8.2 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
pip inatall tensorboard==2.3.0
pip install opencv-python scipy scikit-learn scikit-image matplotlib gdown numpy gast Pillow pyyaml
This setting works for python 3.7 in ubuntu 20.04. (GPU: RTX 3090)
I think it's better to update README.md file. Previously, I was following the exact instructions there.
Ah I see, I think it might be because of the CUDA for the new RTX 30 series. The ones we tested on where GTX 1080. I'll add instructions for those using newer GPUs.
Hi, Alex, Thank you for your excellent work. I some problem when run the pretrained model and train the model. I haven't change the code, but the following errors were reported. RuntimeError: inverse_cuda: For batch 0: U( , ) is zero, singular U. (The values in parentheses are different each time i run them) Have you met this error before, and how can i solve it? Thanks in advance.