drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"
MIT License
601 stars 75 forks source link

OutOfMemoryError when run eval.py #27

Closed vvuonghn closed 1 year ago

vvuonghn commented 1 year ago

Hi @drprojects Thank you for your research, it is very useful. I completed training the model on s3dis data and got the checkpoint. But when I run the command line for evaluate the model, the log shows as bellow. I am using RTX 4090 Ti, my GPU memory is free at that time (~24 GB). torch.cuda.torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.62 GiB (GPU 0; 23.62 GiB total capacity; 15.77 GiB already allocated; 7.08 GiB free; 15.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON: CUDA out of memory. Tried to allocate 7.62 GiB (GPU 0; 23.62 GiB total capacity; 15.77 GiB already allocated; 7.08 GiB free; 15.78 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

vvuonghn commented 1 year ago

Oh, I move to higher GPU memory server, I can run eval.py. I test with public checkpoints, the GPU memory consumption is ~ 48GB. Could the model run eval.py on 24GB GPU

mbendjilali commented 1 year ago

Hello, I've also been trying to test the evaluation script on DALES, and I also run into an OutOfMemoryError, this time in the demo_dales.ipynb file, fourth cell: OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 10.75 GiB total capacity; 10.44 GiB already allocated; 18.50 MiB free; 10.54 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried to set max_split_size_mb to 128, but it didn't work. I am using the spit-2_dales.ckpt provided in the README.

drprojects commented 1 year ago

Hi @vvuonghn thanks for your interest in the project.

Can you please clarify:

Also, please provide the full traceback message so we can see what caused the error. Make sure you set CUDA_LAUNCH_BLOCKING=1 before running, this way the traceback will be more accurate.

vvuonghn commented 1 year ago

Hi @drprojects I am using the same gpu for training and inference. (one gpu NVIDIA 4090TI - 24 GB). The training process completed and report score. But when I run eval.py (this bug was showed)

I am using S3DIS. my machine have a gpu, I try to set export CUDA_VISIBLE_DEVICES=0 and the bug still hapend. I don't modify source code, just push data and training the model I attach the log file for train and eval process below

train.log eval.log

drprojects commented 1 year ago

CUDA_LAUNCH_BLOCKING=1 is a debug env variable used to block kernel launches and to report the proper stacktrace once an assert is triggered. You should not use it in production, but only during debugging. In other words, when you debug code running on CUDA, you need to set CUDA_LAUNCH_BLOCKING=1 if you want to have access to the proper traceback error message. Otherwise the message you get may be unhelpful in locating the source of error.

I had a look at the logs you shared. I notice you are using the s3dis_11g config at training but not at evaluation time. This is probably the reason for your memory error. If you have a look at the difference between these two configs, you will notice that they prepare the data differently.

The provided pretrained weights are trained with s3dis and not s3dis_11g. We do not provide pretrained weights for s3dis_11g so you will have to train yours and evaluate them using the s3dis_11g config (and not the s3dis as you did).

Also, using the s3dis_11g on a 24G GPU is a waste of memory. I recommend you use the s3dis config as a default instead and modify some settings to fit into a 24G GPU. Have a look at the README for a list of settings you can play with to mitigate CUDA errors. I do not have a 24G GPU at hand to help you with that, but I recommend using datamodule.xy_tiling=2 or 3 for a start, and playing with datamodule.sample_graph_k and datamodule.sample_graph_r from there. Have a look and the configs and code to get a grasp of what each does.