Open iWangTing opened 3 months ago
运行脚本时出现以下错误,求好心人帮忙看下如何解决 (lxl) amax@amax:~$ bash sdb1/lxl2/Chinese-CLIP-master/run_scripts/B_finetune_vit-b-16_rbt-base.sh /home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") Loading vision model config from /home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/clip/model_configs/ViT-B-16.json Loading text model config from /home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/clip/model_configs/RoBERTa-wwm-ext-base-chinese.json Traceback (most recent call last): File "/home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/training/main.py", line 350, in <module> main() File "/home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/training/main.py", line 92, in main model_info['use_flash_attention'] = args.use_flash_attention AttributeError: 'Namespace' object has no attribute 'use_flash_attention' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3443613) of binary: /home/amax/.conda/envs/lxl/bin/python Traceback (most recent call last): File "/home/amax/.conda/envs/lxl/bin/torchrun", line 33, in <module> sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')()) File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ sdb1/lxl2/Chinese-CLIP-master/cn_clip/training/main.py FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-04-08_19:59:39 host : amax rank : 0 (local_rank: 0) exitcode : 1 (pid: 3443613) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
您好,错误看起来是args没有解析到use_flash_attention参数,但这个错误看起来比较奇怪,因为use_flash_attention默认是False,您可以检查一下代码和训练脚本。
args
use_flash_attention
False