Closed mooncakehub closed 1 year ago
I don't know if this represents completion or a bug
Hi,
All default configs should work on a 32G device, both for training and evaluation.
nvidia-smi
to make sure no other process is running on your device.Regarding the crashing of your s3dis-11g
training, I cannot interepret results from your screenshot. Please share the full traceback log. You may find it under the Logs
section of your wandb experiment's dashboard:
thanks for your reply,this is my logs
my device is nvidia-2080, it has 12G memory。so I can only run the s3dis-11g.I also want to know that whether the results of the test will be printed on the screen。when I finished running s3dis-11g,it would show me test ruslut on the screen,like this:
thanks for your reply,this is my logs
I do not see any error message and it seems the training reached 500 epochs as expected. Is this the 'crashed' run you were mentionning ? If so, when did it crash ? What are the logs on your device's CLI ?
my device is nvidia-2080, it has 12G memory。so I can only run the s3dis-11g.I also want to know that whether the results of the test will be printed on the screen。when I finished running s3dis-11g,it would show me test ruslut on the screen,like this:
Do you have an RTX 2080 or an RTX 2080 Ti ? The latter has 11G memory but the former only has 8G memory...
To be honest, I am not sure I understand this issue. Your messages seem to indicate you can run the training of s3dis-11g
without problem and the above screeenshot shows the final computation of test scores works too. So, what seems to be the problem ?
If this is just about the 'crashed' state of you see on wandb, it is possible that something minor went wrong with your device when synchronizing with the wandb server at the end of the experiment. But based on your logs and screenshots, it seems the experiment did run successfully. Unless the local logs on your device's CLI say otherwise, I think you can ignore the 'crashed' state from wandb.
thank you for your reply,you really help me a lot.and I have a RTX 2080 Ti ,I am sorry. beside,I want to know where can I modify the gpu code,such as chagging the default 0 to 2
This is usually done by setting your CUDA_VISIBLE_DEVICES
environment variable, ie:
CUDA_VISIBLE_DEVICES=2
I consider this issue solved, closing it.
Hi Damien, When I tested, my cuda of memory ran out.However,I can train s3dis =11g.I want to know how much cuda memory is needed for test.
Beside, I have trained s3dis-11g for 500 epochs,wanb shows me state of crashed.