Open AfterHAL opened 3 months ago
hi, try please these options:
CUDA_VISIBLE_DEVICES="1" accelerate launch
or write inside training script in the beginning:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
hi, try please these options:
CUDA_VISIBLE_DEVICES="1" accelerate launch
or write inside training script in the beginning:
import os os.environ["CUDA_VISIBLE_DEVICES"] = "1"
Thanks a lot @Anghellia .
My bad ! I'm using Ubuntu under windows WSL 2 and the cuda devices are sorted by "power" by default CUDA_DEVICE_ORDER=FASTEST_FIRST
, while they are sorted by PCIBus number under Windows with CUDA_DEVICE_ORDER=PCI_BUS_ID
. So, in my case, the cuda devices 0,1,2,3 under windows becomes 1,0,2,3 under linux.
And there is something strange happening : When CUDA_DEVICE_ORDER is not set, accelerate defaults to the first cuda device on the first on PCIBus, while using CUDA_VISIBLE_DEVICES sets the device per "faster" order ...
By the way, I am now facing and error that I don't understand after "Init AE" :
(xflux_env) xtash@PS3:/mnt/w/xFluxLinux$ CUDA_VISIBLE_DEVICES=0 accelerate launch train_flux_lora_deepspeed.py --config "train_configs/test_lo
ra.yaml"
[2024-08-22 19:38:56,617] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
[2024-08-22 19:39:40,294] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
[2024-08-22 19:39:44,808] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-22 19:39:44,808] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/accelerate/accelerator.py:401: UserWarning: `log_with=wandb` was passed but no supported trackers are currently installed.
warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
08/22/2024 19:39:44 - INFO - __main__ - Distributed environment: DEEPSPEED Backend: nccl
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda:0
Mixed precision type: bf16
ds_config: {'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'gradient_accumulation_steps': 1, 'zero_optimization': {'stage': 2, 'offload_optimizer': {'device': 'none', 'nvme_path': None}, 'offload_param': {'device': 'none', 'nvme_path': None}, 'stage3_gather_16bit_weights_on_model_save': False}, 'gradient_clipping': 'auto', 'steps_per_print': inf, 'bf16': {'enabled': True}, 'fp16': {'enabled': False}}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.23it/s]
Init model
Loading checkpoint
Init AE
E0822 19:41:38.953000 139846884818944 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: -9) local_rank: 0 (pid: 100376) of binary: /mnt/w/xFluxLinux/xflux_env/bin/python3
Traceback (most recent call last):
File "/mnt/w/xFluxLinux/xflux_env/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1067, in launch_command
deepspeed_launcher(args)
File "/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 771, in deepspeed_launcher
distrib_run.run(args)
File "/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/mnt/w/xFluxLinux/xflux_env/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=======================================================
train_flux_lora_deepspeed.py FAILED
-------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
-------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-08-22_19:41:38
host : PS3.
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 100376)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 100376
=======================================================
How can I run de LoRA trainer on my second Cuda device ? It seems to be working nice on the first cuda device :
but this one ain't got enough Vram, so I wanted to start the Lora training script on Device Cuda:1 (which is the second).
Unfortunately, the next command doesn't sets the cuda device to index 1 :
CUDA_VISIBLE_DEVICES=1 accelerate launch --config_file "accelconfig.yaml" train_flux_lora_deepspeed.py --config "train_configs/test_lora.yaml"
any advice ?