OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4k stars 418 forks source link

执行finetune流程时报错 #305

Open Cupies opened 2 months ago

Cupies commented 2 months ago

[2024-04-23 12:06:31,944] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py:183: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use-env is set by default in torchrun. If your script expects --local-rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( Traceback (most recent call last): File "E:\JupyterNotebookSpace\Chinese-CLIP\cn_clip\training\main.py", line 17, in from cn_clip.clip.model import convert_weights, convert_state_dict, resize_pos_embed, CLIP ImportError: cannot import name 'convert_state_dict' from 'cn_clip.clip.model' (D:\Anaconda3\envs\Mlearn\lib\site-packages\cn_clip\clip\model.py) [2024-04-23 12:06:37,013] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 27588) of binary: D:\Anaconda3\envs\Mlearn\python.exe Traceback (most recent call last): File "D:\Anaconda3\envs\Mlearn\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\Anaconda3\envs\Mlearn\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py", line 198, in main() File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py", line 194, in main launch(args) File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launch.py", line 179, in launch run(args) File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\run.py", line 803, in run elastic_launch( File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launcher\api.py", line 135, in call return launch_agent(self._config, self._entrypoint, list(args)) File "D:\Anaconda3\envs\Mlearn\lib\site-packages\torch\distributed\launcher\api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

cn_clip/training/main.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-04-23_12:06:37 host : Jarvis rank : 0 (local_rank: 0) exitcode : 1 (pid: 27588) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================
byraid218 commented 2 months ago

有个报错是这个 ImportError: cannot import name 'convert_state_dict' from 'cn_clip.clip.model' 这里面有找不到convert_state_dict函数的解答 https://github.com/OFA-Sys/Chinese-CLIP/issues/185