OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4k stars 418 forks source link

AttributeError: 'Namespace' object has no attribute 'use_flash_attention' #285

Open iWangTing opened 3 months ago

iWangTing commented 3 months ago
运行脚本时出现以下错误,求好心人帮忙看下如何解决
(lxl) amax@amax:~$   bash sdb1/lxl2/Chinese-CLIP-master/run_scripts/B_finetune_vit-b-16_rbt-base.sh
/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
Loading vision model config from /home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/clip/model_configs/ViT-B-16.json
Loading text model config from /home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/clip/model_configs/RoBERTa-wwm-ext-base-chinese.json
Traceback (most recent call last):
  File "/home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/training/main.py", line 350, in <module>
    main()
  File "/home/amax/sdb1/lxl2/Chinese-CLIP-master/cn_clip/training/main.py", line 92, in main
    model_info['use_flash_attention'] = args.use_flash_attention
AttributeError: 'Namespace' object has no attribute 'use_flash_attention'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 3443613) of binary: /home/amax/.conda/envs/lxl/bin/python
Traceback (most recent call last):
  File "/home/amax/.conda/envs/lxl/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')())
  File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/amax/.conda/envs/lxl/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
sdb1/lxl2/Chinese-CLIP-master/cn_clip/training/main.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-08_19:59:39
  host      : amax
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 3443613)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
DtYXs commented 1 month ago

您好,错误看起来是args没有解析到use_flash_attention参数,但这个错误看起来比较奇怪,因为use_flash_attention默认是False,您可以检查一下代码和训练脚本。