Open tristanwqy opened 2 months ago
torch.Size([4, 4096, 64]) torch.Size([4, 4096, 64]) torch.Size([4]) [rank2]: Traceback (most recent call last): [rank2]: File "/home/ubuntu/flux_training/train_flux_lora_deepspeed.py", line 302, in <module> [rank2]: main() [rank2]: File "/home/ubuntu/flux_training/train_flux_lora_deepspeed.py", line 227, in main [rank2]: x_t = (1 - t) * x_1 + t * x_0 [rank2]: RuntimeError: The size of tensor a (4) must match the size of tensor b (64) at non-singleton dimension 2
batch size = 4 and gradient acculmulation = 4, with 4 GPUs
Have you identified the cause of the issue and found a solution?
[rank0]: main()
[rank0]: File "/mnt/bn/xuqin-lq/workspace/x-flux/train_flux_lora_deepspeed.py", line 241, in main
[rank0]: x_t = (1 - t) * x_1 + t * x_0
[rank0]: RuntimeError: The size of tensor a (2) must match the size of tensor b (64) at non-singleton dimension 2
batch size = 4 and gradient acculmulation = 4, with 4 GPUs