MIC-DKFZ / nnDetection

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.
Apache License 2.0
551 stars 97 forks source link

information regarding training on multiple or single GPUs #193

Closed DSRajesh closed 9 months ago

DSRajesh commented 1 year ago

Hello It is required to train a 3D object detection model (using approximately around 20000 3D images) for which we want to use nnDetection. We just wanted to know if it is possible to achieve the following:

  1. Use 2 GPUs of GE Force RTX 4090 or
  2. use a single GPU i.e GE Force RTX 6000

choice 1 has more number of cores but less memory and choice 2 has more memory but less number of cores.

It would be helpful to proceed ahead if we come to know the feasibility of both above approaches and also the better of these two choices. Or if there are other better options, it would be helpful too...

Thanking You

Rajesh

mibaumgartner commented 1 year ago

Hi @DSRajesh ,

unfortunately, I can not test these GPUs by myself so I can only give you a very rough estimate and the pro and contra of the respective cards, which is mostly based on the available VRAM (assuming the RTX6000 is based on the same AD102 chip as the RTX4090 since nvidia used a bit a confusing naming scheme there).

1) General remark when using nnDetection with high VRAM GPUs: I would highly recommend to perform the planning on a smaller GPU with ~11GB since that will give the best starting point for any further tuning. Running the planning on larger GPUs will increase the patch size which is only beneficial if very large objects are present in the dataset (an alternativ is to run the planning stage with increased batch size and a bigger model when planning).

2) Especially for large data sets I would recommend to: 2.1: bump batch size from 4 to 8 or even 12 , 2.2 increase the number of channels in the models by ~50%, 2.3 if large objects are present which exceed the current patch size increasing it => 2.1 & 2.2 will bump the VRAM to approximately 32GB which would benefit the RTX6000 with 48GB VRAM and wouldn't work for the 4090s. Training time will increase with theses changes to 3-4days per fold

3) nnDetection can only use one GPU. Having two GPUs would allow you to train two folds in parallel which can speed up the final round of experiments (considering that per default 5 folds need to be trained)

So in the end it boils down to the decision: the RTX6000 allows for training of significantly larger models which might be beneficial in the context of large datasets but also train for a very long time. The dual GPU setup will allow to run experiments in parallel which can speed up experimentation time but has a bit of a limitation when scaling to very large models.

Hope this helps :) Best, Michael

DSRajesh commented 1 year ago

Thank you for clarifying that nnDetect does not support multi-GPU training to train a single fold.

Given a task of detecting small objects from Lung CT such as lung nodules, from over 10000 scans what kind of a GPU might work best ?

A. High VRAM and moderate compute speed - the old A6000 48GB, 10752 CUDA cores, 1.45GHz clock

or

(B) High speed and lower VRAM - RTX 4090 24GB, 16384 CUDA cores, 2.2GHz clock

What are the subtleties involved in the nndetect parameters in training with a very large number of datasets. eg. Batch size, etc.

Would we push the memory limits on a RTX4090 ?

Would similar frameworks like the RetinaNet detector in MONAI support multi-GPU or multiple workstations for detection with federated learning etc ?

Thanking You

Rajesh

mibaumgartner commented 1 year ago

From a practical standpoint, two 4090's will likely give you the best value proposition since they are faster and there are two of them. If you assume that the nodules tend to be no the smaller side, bumping the batch size and model size will give you a small performance boost but might already be limited by 24GB VRAM (usually this setup consumes around ~32GB; bumping to batch size 6 instead of 8 might save enough memory to train on 24GB). If you are planning to use the GPUs for a longer time and multiple development interations, 4090's are probably better since your turnaround time will be shorter. If the intention is to train the best model once (and for a very long time) the A6000 is the better choice due to the VRAM.

Technically, it is even possible to train multi-GPUs with nnDetection but the setup is not well tested and there are several caveats to it (e.g. the online evaluation is only computed per GPU and averaged instead of collecting predictions and averaging over all batches; sweep needs to be run separately with decreased batch size etc. ). I'm not sure if the MONAI implementation is covering these problems since their evaluation is based on nnDetection/COCO (this would be DDP training and not federated training). As such we do not actively provide support any problems which might arise from using this feature.

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 9 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.