Open linnanwang opened 1 month ago
same question
The Multi-scale is only used for 512 and 1024px training. The dataset class should be this one: https://github.com/PixArt-alpha/PixArt-sigma/blob/f999a89ce86d2dc2b65d01e77602a4d2f4ddec85/diffusion/data/datasets/InternalData_ms.py#L173
line 162, in collate_tensor_fn
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [4, 104, 152] at entry 0 and [4, 96, 168] at entry 1
Is this the same issue? It happens in train.py and I've got multi scale set to True
Hello there,
Thanks for the great work!
I believe there is a bug for multi-aspect ratio training. When I train on multiGPUs and batch != 1, here is the error msg:
for step, batch in enumerate(train_dataloader): File "/usr/local/lib/python3.10/dist-packages/accelerate/data_loader.py", line 458, in iter next_batch = next(dataloader_iter) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 634, in next data = self._next_data() File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1326, in _next_data return self._process_data(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 2. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 264, in default_collate return collate(batch, collate_fn_map=default_collate_fn_map) File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 142, in collate return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility. File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 142, in
return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed] # Backwards compatibility.
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 119, in collate
return collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensor_fn
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 336, 192] at entry 0 and [3, 256, 256] at entry 2
Please advise the potential causes, thank you!