Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.47k stars 3.39k forks source link

`CombinedLoader` takes a long time when `num_workers > 0` #18584

Open johnathanchiu opened 1 year ago

johnathanchiu commented 1 year ago

Bug description

I am currently using CombinedLoader (https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.utilities.combined_loader.html#lightning.pytorch.utilities.combined_loader.CombinedLoader) to combine multiple datasets. It works fine but I noticed that setting the dataloader with num_workers > 0 causes it to run extremely slow. Is there a logical explanation for this? Feel like this could be a bug otherwise. I attached a chunk of my code to show what I am doing.

What version are you seeing the problem on?

v2.0

How to reproduce the bug

import pytorch_lightning as pl
from lightning.pytorch.utilities.combined_loader import CombinedLoader

class CollectiveDataloader(pl.LightningDataModule):
    def __init__(self, datasets, num_workers=8, batch_size=10, shuffle=True):
        super().__init__()
        self.train_set = CollectiveDataset(
            datasets, num_workers, batch_size, shuffle
        ).datasets

    def train_dataloader(self):
        return CombinedLoader(self.train_set, "sequential")

class CollectiveDataset:
    def __init__(self, datasets, num_workers, batch_size, shuffle):
        # datasets is a dictionary of {dataset_name : Dataset object}
        loaded_datasets = {
            name: DataLoader(
                dataset,
                batch_size=batch_size,
                shuffle=shuffle,
                ### SETTING THIS > 0 RUNS REALLY SLOW ###
                num_workers=num_workers,
            )
            for name, dataset in datasets.items()
        }
        self.datasets = loaded_datasets

Environment

Current environment ``` #- PyTorch Lightning Version: 2.0.7 #- PyTorch Version: 2.0.1 #- Python version: 3.10.12 #- OS: Linux #- CUDA/cuDNN version: 12.0 ```

cc @borda

Martin1007Wang commented 2 months ago

Same issue here