PixArt-alpha / PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
https://pixart-alpha.github.io/PixArt-sigma-project/
GNU Affero General Public License v3.0
1.44k stars 67 forks source link

[bug report] multi-aspect ratio training #85

Open linnanwang opened 1 month ago

linnanwang commented 1 month ago

Hello there,

Thanks for the great work!

I believe there is a bug for multi-aspect ratio training. When I train on multiGPUs and batch != 1, here is the error msg:

for step, batch in enumerate(train_dataloader): File "/usr/local/lib/python3.10/dist-packages/accelerate/data_loader.py", line 458, in iter next_batch = next(dataloader_iter) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 634, in next data = self._next_data() File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1326, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 2. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 264, in default_collate return collate(batch, collate_fn_map=default_collate_fn_map) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 142, in collate return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility. File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 142, in return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility. File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 119, in collate return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensor_fn return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 336, 192] at entry 0 and [3, 256, 256] at entry 2

Please advise the potential causes, thank you!

xiaoxiaodadada commented 1 month ago

same question

lawrence-cj commented 1 month ago

The Multi-scale is only used for 512 and 1024px training. The dataset class should be this one: https://github.com/PixArt-alpha/PixArt-sigma/blob/f999a89ce86d2dc2b65d01e77602a4d2f4ddec85/diffusion/data/datasets/InternalData_ms.py#L173

GavChap commented 1 month ago
line 162, in collate_tensor_fn
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [4, 104, 152] at entry 0 and [4, 96, 168] at entry 1

Is this the same issue? It happens in train.py and I've got multi scale set to True