Lu-Feng / DHE-VPR

Official repository for the AAAI 2024 paper "Deep Homography Estimation for Visual Place Recognition".
MIT License
65 stars 4 forks source link

CUDA error: out of memory #2

Closed Frost24K closed 5 months ago

Frost24K commented 6 months ago

Hello, I'm currently in the process of attempting to replicate the remarkable work you've shared. However, I've encountered a hurdle along the way. Following the training pipeline outlined in the repository and adhering to the provided requirements, I attempted to train the DHE-VPR using 4090 with 24GB graphics memory, same size with the 3090 . Unfortunately, after training for 1 epoch, the program crashed and reported the following issue:

Traceback (most recent call last): File "/home/xx/Documents/codespace/DHE-VPR/train_dhe.py", line 151, in REIloss = homography_project.reprojection_error_ofinliers(model, queries_fw, positives_fw, weights=random_weights) File "/home/xx/Documents/codespace/DHE-VPR/homography_project.py", line 102, in reprojection_error_ofinliers reproject_error[i] = match_batch_tensor(query, pred, theta, trainflag=True, img_size=(384,384)) File "/home/xx/Documents/codespace/DHE-VPR/homography_project.py", line 31, in match_batch_tensor max1 = torch.argmax(M, dim=1) #(N,l) RuntimeError: CUDA error: out of memory

I believe this issue is related to memory allocation on the CUDA device. Your expertise and guidance in resolving this problem would be immensely valuable to me

Thank you for your time

Lu-Feng commented 6 months ago

Hello, thanks for your interest in our work. Run train_dhe.py will take up about 12G memory on GPU. Is it possible that other programs are already taking up some memory while you are running this program? You can also reduce the batch_size and try again.