fepegar / torchio

Medical imaging toolkit for deep learning
https://torchio.org
Apache License 2.0
2.08k stars 240 forks source link

Halve queue length when using DDP #1125

Closed haughty-yeon closed 1 year ago

haughty-yeon commented 1 year ago

šŸš€ Feature

DDP is a way to split the work by the GPU usage, and the data also should be splitted by GPU. The split by GPU is already implemented, but then the total length should also be halved.

Motivation

The data halving is not supported in any version, thus the iteration is duplicated. Data duplication could make the training to converge faster, but this is not intended. Also, when using dataloader solely, it also halves the total data length.

Pitch

By the subject_sampler, the iterations_per_epoch will be modified by the data splitted, as well as the num_subjects. Such that queue will iterate without any duplication.

Additional context

The subject_sampler of queue will acquire the list of index that the data needs to be splitted. e.g. subject_sampler for GPU 0 = [0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ...] subject_sampler for GPU 1 = [1, 2, 5, 6, 8, 9, 10, 11, 14, ...]

With current version, queue will give data by rank, but with the same size as the original. e.g. subject_sampler for GPU 0 = [0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ..., 0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ...] subject_sampler for GPU 1 = [1, 2, 5, 6, 8, 9, 10, 11, 14, ..., 1, 2, 5, 6, 8, 9, 10, 11, 14, ...]

subject_sampler already have splitted data well by GPU, so this list will be utilized for new subject_dataset that will halve length. The code modification will be as the following:

# the length of subjects will be halved if subject_sampler exist
@property
def num_subjects(self) -> int:
    if self.subject_sampler is not None: # this if statement is added
        return len(self.subject_sampler)
    return len(self.subjects_dataset)
# the subjects_dataset iteration will be selected using the subject_sampler list inputted
@property
def iterations_per_epoch(self) -> int:
    if self.subject_sampler is not None: # this subjects_dataset modification is added
        subjects_dataset = [self.subjects_dataset.dry_iter()[i] for i in self.subject_sampler]
    else:
        subjects_dataset = self.subjects_dataset.dry_iter()

    total_num_patches = sum(
        self._get_subject_num_samples(subject)
        for subject in subjects_dataset # and call local, not global one
    )
    return total_num_patches

I have tested my self, with TorchIO version 0.19.1 and it halved the total data. This is the original version: queue dataset division not working And below is the modified version: queue dataset division working properly


[EDIT] Think the explanation was not enough so adding more:D

When subject_sampler is inputted to queue, iterable subject will be produced from def _get_subjects_iterable(self). This iterable subject is loaded and used in def _fill(self) and carried out until it reaches the length of self.subjects_dataset.

But when it comes to DDP, the data length in each GPU should be divided by the number of GPU usage. But the data for each GPU is already and efficiently divided by DistributedSampler, which is the subject_sampler input. So the total length used in def _fill(self) has to be equivalent to the length of DistributedSampler.

Meaning that, current version of queue loads N subjects, and divide the data by the number of GPU K. And this queue iterates until the len(self.subjects_dataset), which is N.

data for GPU 0 = [0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ...] (=N//K) data` for GPU 1 = [1, 2, 5, 6, 8, 9, 10, 11, 14, ...] (=N//K) GPU 0 queue = [0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ..., 0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ...] (=N) GPU 1 queue = [1, 2, 5, 6, 8, 9, 10, 11, 14, ..., 1, 2, 5, 6, 8, 9, 10, 11, 14, ...] (=N)

But the queue of modified code iterates until the len(DistributedSampler), which is N//K.

data for GPU 0 = [0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ...] (=N//K) data for GPU 1 = [1, 2, 5, 6, 8, 9, 10, 11, 14, ...] (=N//K) GPU 0 queue = [0, 3, 4, 7, 12, 13, 15, 16, 18, 25, ...] (=N//K) GPU 1 queue = [1, 2, 5, 6, 8, 9, 10, 11, 14, ...] (=N//K)

You can check my pull request: https://github.com/fepegar/torchio/pull/1127

fepegar commented 1 year ago

Great catch! Thanks for reporting, @haughty-yeon.

fepegar commented 1 year ago

Fixed by @haughty-yeon in #1127 šŸŽ‰