[2024-09-02 00:39:59,451] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
W0902 00:40:01.940000 133343758611968 torch/distributed/run.py:779]
W0902 00:40:01.940000 133343758611968 torch/distributed/run.py:779] *****************************************
W0902 00:40:01.940000 133343758611968 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0902 00:40:01.940000 133343758611968 torch/distributed/run.py:779] *****************************************
[2024-09-02 00:40:12,774] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-02 00:40:12,776] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
[2024-09-02 00:40:13,500] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-09-02 00:40:13,501] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-09-02 00:40:13,508] [INFO] [comm.py:637:init_distributed] cdb=None
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/accelerate/accelerator.py:401: UserWarning: `log_with=wandb` was passed but no supported trackers are currently installed.
warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
09/02/2024 00:40:13 - INFO - __main__ - Distributed environment: DEEPSPEED Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0
Mixed precision type: bf16
ds_config: {'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'gradient_accumulation_steps': 2, 'zero_optimization': {'stage': 2, 'offload_optimizer': {'device': 'none', 'nvme_path': None}, 'offload_param': {'device': 'none', 'nvme_path': None}, 'stage3_gather_16bit_weights_on_model_save': False}, 'gradient_clipping': 1.0, 'steps_per_print': inf, 'bf16': {'enabled': True}, 'fp16': {'enabled': False}}
/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/accelerate/accelerator.py:401: UserWarning: `log_with=wandb` was passed but no supported trackers are currently installed.
warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
09/02/2024 00:40:13 - INFO - __main__ - Distributed environment: DEEPSPEED Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1
Mixed precision type: bf16
ds_config: {'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'gradient_accumulation_steps': 2, 'zero_optimization': {'stage': 2, 'offload_optimizer': {'device': 'none', 'nvme_path': None}, 'offload_param': {'device': 'none', 'nvme_path': None}, 'stage3_gather_16bit_weights_on_model_save': False}, 'gradient_clipping': 1.0, 'steps_per_print': inf, 'bf16': {'enabled': True}, 'fp16': {'enabled': False}}
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.72it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.84it/s]
Init model
Init model
Loading checkpoint
Loading checkpoint
Init AE
Init AE
W0902 00:42:50.212000 133343758611968 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 3788 closing signal SIGTERM
E0902 00:42:54.953000 133343758611968 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: -9) local_rank: 0 (pid: 3787) of binary: /mnt/data/miniconda3/envs/x-flux/bin/python
Traceback (most recent call last):
File "/mnt/data/miniconda3/envs/x-flux/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1067, in launch_command
deepspeed_launcher(args)
File "/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/accelerate/commands/launch.py", line 771, in deepspeed_launcher
distrib_run.run(args)
File "/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/mnt/data/miniconda3/envs/x-flux/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=====================================================
train_flux_lora_deepspeed.py FAILED
-----------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
-----------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-09-02_00:42:50
host : localhost
rank : 0 (local_rank: 0)
exitcode : -9 (pid: XXXX)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID XXXX