rank0: Traceback (most recent call last):
rank0: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 370, in
rank0: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 303, in main
rank0: File "/home/reaper/xdit_test/piflux/src/piflux/adapters/diffusers.py", line 84, in new_call
rank0: seed_t = torch.full([1], seed, dtype=torch.int64)
rank0: RuntimeError: value cannot be converted to type int64_t without overflow
rank0:[W1115 09:08:20.026822356 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
rank1: Traceback (most recent call last):
rank1: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 370, in
rank1: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 303, in main
rank1: File "/home/reaper/xdit_test/piflux/src/piflux/adapters/diffusers.py", line 85, in new_call
rank1: seed_t = piflux_ops.get_complete_tensor(seed_t, dim=0)
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_ops.py", line 1116, in callrank1: return self._op(*args, (kwargs or {}))
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/autograd.py", line 113, in autograd_impl
rank1: result = forward_no_grad(args, Metadata(keyset, keyword_only_args))
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/autograd.py", line 40, in forward_no_grad
rank1: result = op.redispatch(keyset & _C._after_autograd_keyset, args, kwargs)
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_ops.py", line 721, in redispatch
rank1: return self._handle.redispatch_boxed(keyset, *args, kwargs)
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/custom_ops.py", line 324, in backend_impl
rank1: result = self._backend_fns[device_type](*args, *kwargs)
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
rank1: return disable_fn(args, kwargs)
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
rank1: return fn(*args, kwargs)
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/custom_ops.py", line 367, in wrapped_fn
rank1: return fn(*args, *kwargs)
rank1: File "/home/reaper/xdit_test/piflux/src/piflux/ops/context_ops.py", line 80, in get_complete_tensor
rank1: dist.all_gather(gathered_tensors, tensor.contiguous())
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper
rank1: return func(args, kwargs)
rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3346, in all_gather
rank0: Traceback (most recent call last): rank0: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 370, in
rank0: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 303, in main
rank0: File "/home/reaper/xdit_test/piflux/src/piflux/adapters/diffusers.py", line 84, in new_call rank0: seed_t = torch.full([1], seed, dtype=torch.int64) rank0: RuntimeError: value cannot be converted to type int64_t without overflow rank0:[W1115 09:08:20.026822356 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) rank1: Traceback (most recent call last): rank1: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 370, in
rank1: File "/home/reaper/xdit_test/piflux/examples/run_flux.py", line 303, in main
rank1: File "/home/reaper/xdit_test/piflux/src/piflux/adapters/diffusers.py", line 85, in new_call rank1: seed_t = piflux_ops.get_complete_tensor(seed_t, dim=0) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_ops.py", line 1116, in call rank1: return self._op(*args, (kwargs or {})) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/autograd.py", line 113, in autograd_impl rank1: result = forward_no_grad(args, Metadata(keyset, keyword_only_args)) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/autograd.py", line 40, in forward_no_grad rank1: result = op.redispatch(keyset & _C._after_autograd_keyset, args, kwargs) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_ops.py", line 721, in redispatch rank1: return self._handle.redispatch_boxed(keyset, *args, kwargs) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/custom_ops.py", line 324, in backend_impl rank1: result = self._backend_fns[device_type](*args, *kwargs) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner rank1: return disable_fn(args, kwargs) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn rank1: return fn(*args, kwargs) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/_library/custom_ops.py", line 367, in wrapped_fn rank1: return fn(*args, *kwargs) rank1: File "/home/reaper/xdit_test/piflux/src/piflux/ops/context_ops.py", line 80, in get_complete_tensor rank1: dist.all_gather(gathered_tensors, tensor.contiguous()) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 83, in wrapper rank1: return func(args, kwargs) rank1: File "/home/reaper/miniconda3/envs/fjr/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3346, in all_gather