SxJyJay / MSMDFusion

[CVPR 2023] MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
Apache License 2.0
183 stars 11 forks source link

GPU memory for evaluation #36

Closed proggheli closed 9 months ago

proggheli commented 9 months ago

How much GPU memory is need to evaluate the model?

I´m running:python3 tools/test.py configs/MSMDFusion_nusc_voxel_LC.py checkpoint/MSMDFusion.pth --eval bbox

But get the error message: RuntimeError: CUDA out of memory. Tried to allocate 1.21 GiB (GPU 0; 7.77 GiB total capacity; 4.40 GiB already allocated; 1.21 GiB free; 5.05 GiB reserved in total by PyTorch)

sun-yue2002 commented 9 months ago

I met this problem too with ~8 RTX4090 GPUs ~, which is not normal. I figure it out by cancling this comment https://github.com/SxJyJay/MSMDFusion/blob/7b5b2741e693ba8007c95e3e8951e4e67fbc47ed/mmdet3d/models/middle_encoders/sparse_multimodal_encoder_painting.py#L432.

Hope it will be helpful.

How much GPU memory is need to evaluate the model?

I´m running:python3 tools/test.py configs/MSMDFusion_nusc_voxel_LC.py checkpoint/MSMDFusion.pth --eval bbox

But get the error message: RuntimeError: CUDA out of memory. Tried to allocate 1.21 GiB (GPU 0; 7.77 GiB total capacity; 4.40 GiB already allocated; 1.21 GiB free; 5.05 GiB reserved in total by PyTorch)

proggheli commented 9 months ago

Thanks for quick response, I tried it but it but did not change anything. Is there anything else I can try? I am currently exploring torch.cuda.empty_cache() and torch.utils.checkpoint.checkpoint() do you think any of these methods will work? And if so which file/files should I implement them?

sun-yue2002 commented 9 months ago

Thanks for quick response, I tried it but it but did not change anything. Is there anything else I can try? I am currently exploring torch.cuda.empty_cache() and torch.utils.checkpoint.checkpoint() do you think any of these methods will work? And if so which file/files should I implement them?

Maybe change this to 1 will help . And what is your GPU memory?https://github.com/SxJyJay/MSMDFusion/blame/7b5b2741e693ba8007c95e3e8951e4e67fbc47ed/configs/MSMDFusion_nusc_voxel_LC.py#L104

proggheli commented 9 months ago

That was the first thing i tried :), but I did not have any affect as well. I have 8Gb GPU memory will that be enough to evaluate?

sun-yue2002 commented 9 months ago

That was the first thing i tried :), but I did not have any affect as well. I have 8Gb GPU memory will that be enough to evaluate?

I think your GPU is the problem. Maybe you can use server like autodl or try anonther dataset like nuscenes-mini.

proggheli commented 9 months ago

Yes I think so too. I have another question regarding the checkpoint files for the pretrained model, right now there is a checkpoint pth file, is there a json format of as well? Or should I just convert it?

Im wondering since CenterPoint wants the checkpoint detection file to be json format.

sun-yue2002 commented 9 months ago

Yes I think so too. I have another question regarding the checkpoint files for the pretrained model, right now there is a checkpoint pth file, is there a json format of as well? Or should I just convert it?

Im wondering since CenterPoint wants the checkpoint detection file to be json format.

Are you doing the vitual points generating part? If so, I haven't done this, so sorry about unenabling to answer your question.

Libraaer commented 2 months ago

I met this problem too with ~8 RTX4090 GPUs ~, which is not normal. I figure it out by cancling this comment

https://github.com/SxJyJay/MSMDFusion/blob/7b5b2741e693ba8007c95e3e8951e4e67fbc47ed/mmdet3d/models/middle_encoders/sparse_multimodal_encoder_painting.py#L432

. Hope it will be helpful.

How much GPU memory is need to evaluate the model? I´m running:python3 tools/test.py configs/MSMDFusion_nusc_voxel_LC.py checkpoint/MSMDFusion.pth --eval bbox But get the error message: RuntimeError: CUDA out of memory. Tried to allocate 1.21 GiB (GPU 0; 7.77 GiB total capacity; 4.40 GiB already allocated; 1.21 GiB free; 5.05 GiB reserved in total by PyTorch)

Hello, have you trained this model? I'm using A800 here but still not enough to train. If you've trained, can you give me an answer to the reason for my lack of memory here, thank you