Open azuredsky opened 8 months ago
我修改了tools/train.py 如下内容 strategy = "ddp"
devices=[0,1,2,3]
trainer = pl.Trainer( default_root_dir=cfg.save_dir, max_epochs=cfg.schedule.total_epochs, check_val_every_n_epoch=cfg.schedule.val_intervals, accelerator=accelerator, devices=devices, gpus=len(devices), log_every_n_steps=cfg.log.interval, num_sanity_val_steps=0, callbacks=[TQDMProgressBar(refresh_rate=0)], # disable tqdm bar logger=logger, benchmark=cfg.get("cudnn_benchmark", True), gradient_clip_val=cfg.get("grad_clip", 0.0), strategy=strategy, precision=precision, )
训练后只能够使用单GPU,不知道我哪里设置的不对? 环境如下: pytorch-lightning 1.9.5 pytorch 1.13.1 py3.10_cuda11.6_cudnn8.3.2_0 pytorch
我修改了tools/train.py 如下内容 strategy = "ddp"
devices=[0,1,2,3]
训练后只能够使用单GPU,不知道我哪里设置的不对? 环境如下: pytorch-lightning 1.9.5 pytorch 1.13.1 py3.10_cuda11.6_cudnn8.3.2_0 pytorch