fepegar / torchio

Medical imaging toolkit for deep learning
https://torchio.org
Apache License 2.0
2.08k stars 240 forks source link

Queue length modification with the use of DDP #1127

Closed haughty-yeon closed 1 year ago

haughty-yeon commented 1 year ago

num_subjects() iterations_per_epoch() modified

Fixes #1125.

Description

Checklist

fepegar commented 1 year ago

This makes total sense. I've added some minor readability changes and tested the new implementation as follows:

import os

import torch
import torch.distributed as dist
import torchio as tio
from loguru import logger

num_subjects = 6
samples_per_volume = 2
max_length = 1000

subjects = []
tensor = torch.ones(1, 16, 16, 16)
for i in range(num_subjects):
    subject = tio.Subject(
        image=tio.ScalarImage(tensor=i * tensor),
        id=i,
    )
    subjects.append(subject)
dataset = tio.SubjectsDataset(subjects)

is_distributed = bool(os.environ.get('WORLD_SIZE'))
if is_distributed:
    dist.init_process_group()
    subject_sampler = torch.utils.data.distributed.DistributedSampler(
        dataset,
        shuffle=False,
    )
    rank = dist.get_rank()
else:
    subject_sampler = None
    rank = 0

patch_sampler = tio.sampler.UniformSampler(patch_size=2)

queue = tio.Queue(
    dataset,
    max_length,
    sampler=patch_sampler,
    samples_per_volume=samples_per_volume,
    num_workers=0,
    shuffle_subjects=False,
    shuffle_patches=False,
    subject_sampler=subject_sampler,
)

loader = torch.utils.data.DataLoader(
    queue,
    batch_size=1,
    num_workers=0,
    shuffle=False,
    collate_fn=lambda x: x[0],
)

for i, patch in enumerate(loader):
    logger.info(f'Rank {rank} | Batch {i} | Subject {patch["id"]}')

Run with

torchrun --nproc_per_node=3 /tmp/ddp.py

Output:

2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 0 | Subject 1
2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 1 | Subject 1
2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 2 | Subject 4
2023-11-23 16:19:14.933 | INFO     | __main__:<module>:57 - Rank 1 | Batch 3 | Subject 4
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 0 | Subject 0
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 1 | Subject 0
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 2 | Subject 3
2023-11-23 16:19:14.935 | INFO     | __main__:<module>:57 - Rank 0 | Batch 3 | Subject 3
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 0 | Subject 2
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 1 | Subject 2
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 2 | Subject 5
2023-11-23 16:19:14.947 | INFO     | __main__:<module>:57 - Rank 2 | Batch 3 | Subject 5
fepegar commented 1 year ago

Thanks for your contribution, @haughty-yeon!

@allcontributors please add @haughty-yeon for bug

allcontributors[bot] commented 1 year ago

@fepegar

I couldn't determine any contributions to add, did you specify any contributions? Please make sure to use valid contribution names.

I've put up a pull request to add @haughty-yeon! :tada: