cuda out of memory when test

drprojects / superpoint_transformer

Official PyTorch implementation of Superpoint Transformer introduced in [ICCV'23] "Efficient 3D Semantic Segmentation with Superpoint Transformer" and SuperCluster introduced in [3DV'24 Oral] "Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering"

MIT License

601 stars 75 forks source link

cuda out of memory when test #21

Closed mooncakehub closed 1 year ago

mooncakehub commented 1 year ago

Hi Damien, When I tested, my cuda of memory ran out.However,I can train s3dis =11g.I want to know how much cuda memory is needed for test.

Beside, I have trained s3dis-11g for 500 epochs,wanb shows me state of crashed.

1692257616580

mooncakehub commented 1 year ago

1692257723374

I don't know if this represents completion or a bug

drprojects commented 1 year ago

Hi,

All default configs should work on a 32G device, both for training and evaluation.

What is your GPU device ?
How much memory does your GPU have ?
Did you make sure you stopped all other GPU processes before running the code ? eg if you are running on your local workstation your GUI, or your browser could be consuming GPU mmeory too. To this end, use nvidia-smi to make sure no other process is running on your device.

Regarding the crashing of your s3dis-11g training, I cannot interepret results from your screenshot. Please share the full traceback log. You may find it under the Logs section of your wandb experiment's dashboard:

mooncakehub commented 1 year ago

thanks for your reply，this is my logs 3dc4925cba369a072ede893c3c44036

mooncakehub commented 1 year ago

my device is nvidia-2080, it has 12G memory。so I can only run the s3dis-11g.I also want to know that whether the results of the test will be printed on the screen。when I finished running s3dis-11g,it would show me test ruslut on the screen,like this: 微信图片_20230822153724

drprojects commented 1 year ago

thanks for your reply，this is my logs

I do not see any error message and it seems the training reached 500 epochs as expected. Is this the 'crashed' run you were mentionning ? If so, when did it crash ? What are the logs on your device's CLI ?

my device is nvidia-2080, it has 12G memory。so I can only run the s3dis-11g.I also want to know that whether the results of the test will be printed on the screen。when I finished running s3dis-11g,it would show me test ruslut on the screen,like this:

Do you have an RTX 2080 or an RTX 2080 Ti ? The latter has 11G memory but the former only has 8G memory...

To be honest, I am not sure I understand this issue. Your messages seem to indicate you can run the training of s3dis-11g without problem and the above screeenshot shows the final computation of test scores works too. So, what seems to be the problem ?

drprojects commented 1 year ago

If this is just about the 'crashed' state of you see on wandb, it is possible that something minor went wrong with your device when synchronizing with the wandb server at the end of the experiment. But based on your logs and screenshots, it seems the experiment did run successfully. Unless the local logs on your device's CLI say otherwise, I think you can ignore the 'crashed' state from wandb.

mooncakehub commented 1 year ago

thank you for your reply,you really help me a lot.and I have a RTX 2080 Ti ,I am sorry. beside,I want to know where can I modify the gpu code,such as chagging the default 0 to 2

drprojects commented 1 year ago

This is usually done by setting your CUDA_VISIBLE_DEVICES environment variable, ie:

CUDA_VISIBLE_DEVICES=2

drprojects commented 1 year ago

I consider this issue solved, closing it.