Open tedi14 opened 3 months ago
you can try this command: CUDA_VISIBLE_DEVICES=0 nnuNetv2 train 800 3d fullres 4 --disable checkpointing -tr nnUNetTrainer 1epoch if you want to use multiple GPUs for training,you can do like this: nnUNetv2_train DATASET_NAME_OR_ID 2d 0 [--npz] -num_gpus X
Using multiple GPUs for training ->how_to_use_nnunet
Hi, Thanks for the reply It tells me that CUDA_VISIBLE_DEVICES isn't a recognised command. Have I not downloaded something?
Yeah, CMD can not recognize the command. You can try running command on Git bash.
I still seem to get a similar issue
try to set batch size = 1?
It doesn't change the error still the same issue
Can you take a complete screenshot? Start by entering commands. Try using the default trainer?
and here is with regular trainer
I noticed this error: "Not enough memory resources are available to process this command." How much memory do you have available when running this command? Were you able to observe any memory spikes or instances where the memory was fully utilized?
Hi sorry for the late reply there is 96GB of memory available wouldn't that be enough?
Have you tried running it on Ubuntu? Windows often has various issues like this, which I frequently encounter as well during execution. It might be due to compatibility problems. Someone else has encountered a similar issue as well. https://github.com/MIC-DKFZ/nnUNet/issues/1652
Hi @tedi14,
You need to check the gpu utilisation. Please either use nvidia-smi or some option available in windows to see how much GPU is being used and what is using the GPU. If its a memory lock, you need to release it and then it should work - probably after a restart ;)
Cheers, Lalith
For the second issue if it is not solved yet:
We encountered exactly the same problem today, on a system of 96GB RAM. It turns out to be the data uncompressing issue (or plan and preprocess issue). We are not sure about the root cause, however, it was solved after removing all the .npy (not *_seg.npy) under nnUNet_preprocessed
folder.
The training script then redid the dataset uncompressing and it worked. So would recommend try clear up enough disk space, delete your dataset under preprocessed folder. Retry plan and preprocess with a smaller -np number.
Hey @tedi14 is this issue still persisting? If yes it would be great if you could post what the issue was and close this issue.
Hello, I am a student working on a project trying to understand how nnUNet works with TotalSegmentator. I was wondering how am I supposed to type in a command so the training happens on the GPUs. Also I wanted to ask how can I get to see the contouring it has made after the training? I am quite new to all this any help and guidance will be very appreciated :) I asked on the TotalSegmentator GitHub but I am still not sure so I thought it will be worth asking here. It always seems to go onto 100% usage of the CPU even if I add cuda:0 into the command for training. How can I make it so it trains on multiple GPUs instead of the CPU. I am also not sure on how to get rstruct contours from my own pretrained_weights.
It keeps on giving this type of Error
Also not sure whats happening here: