Open jxchen01 opened 2 years ago
More than a bug, should be considered as a request for enhancement. It requires to review the torchio code and make it able to "pickled", so that can be sent to the subprocesses of pytorch-lightning.
Have you tried using another strategy for multi-gpu, like "dpp" or "deepspeed" with pytorch-lightning? I was able to start a training in both cases. The only drawback is that there number of dataloaders/queue is duplicated for each of the processes that are created this way (which is linked to the number of gpus)
I'm having the same issue, when using: sampler = tio.data.LabelSampler(patch_size=96, label_name="Label", label_probabilities={0:0.4, 1:0.6})
and
train_patches_queue = tio.Queue( training_set, max_length=40, samples_per_volume=5, sampler=sampler, num_workers=8 )
val_patches_queue = tio.Queue( validation_set, max_length=40, samples_per_volume=5, sampler=sampler, num_workers=8 )
in mine DataLoader:
batch_size = 2
train_loader = torch.utils.data.DataLoader(train_patches_queue, batch_size=batch_size, num_workers=8) val_loader = torch.utils.data.DataLoader(val_patches_queue, batch_size=batch_size, num_workers=8)
I'm using a single GPU with 6GB
Is there an existing issue for this?
Bug summary
I am testing to use Pytorch-lightning to handle model training (easy to use multi-gpu training and other training tricks) while using the TorchIO as dataloader. But, I always get errors.
Code for reproduction
Actual outcome
this is just a pseudocode
Error messages
Expected outcome
I hope to use torchio dataloader in a multi-gpu training script
System info