是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

(Qwen) shawn@compute2:~/diska/samba/Train/BIgmode/Qwen-main$ sh scripts/train_finetune.sh WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

[2024-02-21 10:08:26,506] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-21 10:08:26,513] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-02-21 10:08:31,106] [INFO] [comm.py:637:init_distributed] cdb=None [2024-02-21 10:08:31,201] [INFO] [comm.py:637:init_distributed] cdb=None [2024-02-21 10:08:31,202] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You passed quantization_config to from_pretrained but the model you're loading already has a quantization_config attribute and has already quantized weights. However, loading attributes (e.g. disable_exllama, use_cuda_fp16) will be overwritten with the one you passed to from_pretrained. The rest will be ignored. You passed quantization_config to from_pretrained but the model you're loading already has a quantization_config attribute and has already quantized weights. However, loading attributes (e.g. disable_exllama, use_cuda_fp16) will be overwritten with the one you passed to from_pretrained. The rest will be ignored. Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码，尤其如果你在9月25日前已经开始使用Qwen-7B，千万注意不要使用错误代码和模型。 Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码，尤其如果你在9月25日前已经开始使用Qwen-7B，千万注意不要使用错误代码和模型。 Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.09s/it] Loading checkpoint shards: 100%|████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.33s/it] trainable params: 143,130,624 || all params: 1,388,056,576 || trainable%: 10.311584302454254 Loading data... Formatting inputs...Skip in lazy mode Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. trainable params: 143,130,624 || all params: 1,388,056,576 || trainable%: 10.311584302454254 Using /home/shawn/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Using /home/shawn/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/shawn/.cache/torch_extensions/py310_cu118/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.12494802474975586 seconds Loading extension module fused_adam... Time to load fused_adam op: 0.20195293426513672 seconds {'loss': 1.6189, 'learning_rate': 0.0, 'epoch': 0.04}
{'loss': 1.42, 'learning_rate': 0.0003, 'epoch': 0.08}
{'loss': 1.4318, 'learning_rate': 0.0003, 'epoch': 0.13}
{'loss': 1.3541, 'learning_rate': 0.0003, 'epoch': 0.17}
{'loss': 1.1495, 'learning_rate': 0.0003, 'epoch': 0.21}
{'loss': 1.1028, 'learning_rate': 0.0003, 'epoch': 0.25}
{'loss': 0.9797, 'learning_rate': 0.0003, 'epoch': 0.29}
{'train_runtime': 97.7663, 'train_samples_per_second': 2.323, 'train_steps_per_second': 0.072, 'train_loss': 1.2938276018415178, 'epoch': 0.29} 100%|███████████████████████████████████████████████████████████████████████████| 7/7 [01:37<00:00, 13.97s/it] Traceback (most recent call last): File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn sock = connection.create_connection( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request raise new_e File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request self._validate_conn(conn) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn conn.connect() File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect self.sock = sock = self._new_conn() File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connection.py", line 213, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7faa660a8ee0>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen retries = retries.increment( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Qwen/Qwen-7B-Chat-Int4/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faa660a8ee0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/shawn/diska/samba/Train/BIgmode/Qwen-main/finetune.py", line 374, in train() File "/home/shawn/diska/samba/Train/BIgmode/Qwen-main/finetune.py", line 370, in train safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir, bias=lora_args.lora_bias) File "/home/shawn/diska/samba/Train/BIgmode/Qwen-main/finetune.py", line 122, in safe_save_model_for_hf_trainer trainer._save(output_dir, state_dict=state_dict) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/transformers/trainer.py", line 2865, in _save self.model.save_pretrained( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/peft/peft_model.py", line 216, in save_pretrained output_state_dict = get_peft_model_state_dict( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 146, in get_peft_model_state_dict has_remote_config = file_exists(model_id, "config.json") File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 2386, in file_exists get_hf_file_metadata(url, token=token) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata r = _request_wrapper( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper response = _request_wrapper( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper response = get_session().request(method=method, url=url, params) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 67, in send return super().send(request, args, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Qwen/Qwen-7B-Chat-Int4/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7faa660a8ee0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 3813df15-ee1e-4ce7-ac2b-75b387e41eb8)') ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 881) of binary: /home/shawn/diska/anaconda/envs/Qwen/bin/python Traceback (most recent call last): File "/home/shawn/diska/anaconda/envs/Qwen/bin/torchrun", line 8, in sys.exit(main()) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(args, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call** return launch_agent(self._config, self._entrypoint, list(args)) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

finetune.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-02-21_10:10:36 host : compute2 rank : 0 (local_rank: 0) exitcode : 1 (pid: 881) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ ### 期望行为 | Expected Behavior 我把保存的epoch写的很小，快速复现错误，自己搭建的环境没有使用docker，使用的annaocnda，两个2080TI显卡，希望能够至少保存模型能成功，微调就是默认命令： bash finetune/finetune_qlora_ds.sh \ --model Qwen/Qwen-7B-Chat-Int4 \ --data my_datasets/AutoAudit_qwen.json finetune_qlora_ds.sh文件最下边： torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --data_path $DATA \ --fp16 True \ --output_dir output_qwen \ --num_train_epochs 0.3 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 10 \ --learning_rate 3e-4 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --report_to "none" \ --model_max_length 512 \ --lazy_preprocess True \ --use_lora \ --q_lora \ --gradient_checkpointing \ --deepspeed finetune/ds_config_zero2.json ### 复现方法 | Steps To Reproduce _No response_ ### 运行环境 | Environment ```Markdown - OS: ubuntu18.04 - Python: python 3.10.0 - Transformers:4.32.0 - PyTorch:2.0.1 - CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.8 ``` ### 备注 | Anything else? _No response_

我又进行了单卡测试，错误信息如下，貌似它在get_hf_file_metadata，下载不到，保存模型也要下载这个吗？请问我怎么离线下载好，就是断开网络也可以正常训练。希望能得到帮助，感激不尽！ (Qwen) shawn@compute2:~/diska/samba/Train/BIgmode/Qwen-main$ bash finetune/finetune_qlora_single_gpu.sh [2024-02-21 16:01:03,344] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) You passed quantization_config to from_pretrained but the model you're loading already has a quantization_config attribute and has already quantized weights. However, loading attributes (e.g. disable_exllama, use_cuda_fp16) will be overwritten with the one you passed to from_pretrained. The rest will be ignored. Warning: please make sure that you are using the latest codes and checkpoints, especially if you used Qwen-7B before 09.25.2023.请使用最新模型和代码，尤其如果你在9月25日前已经开始使用Qwen-7B，千万注意不要使用错误代码和模型。 Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|███████████████████████████████████| 3/3 [00:02<00:00, 1.42it/s] trainable params: 143,130,624 || all params: 1,388,056,576 || trainable%: 10.311584302454254 Loading data... Formatting inputs...Skip in lazy mode Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Using /home/shawn/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/shawn/.cache/torch_extensions/py310_cu118/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.04670214653015137 seconds {'loss': 1.4348, 'learning_rate': 0.0, 'epoch': 0.01}
{'loss': 1.6305, 'learning_rate': 0.0003, 'epoch': 0.02}
{'loss': 1.7119, 'learning_rate': 0.0003, 'epoch': 0.03}
{'loss': 1.3699, 'learning_rate': 0.0003, 'epoch': 0.04}
{'loss': 1.1099, 'learning_rate': 0.0003, 'epoch': 0.05}
{'loss': 1.2408, 'learning_rate': 0.0003, 'epoch': 0.06}
{'loss': 0.8442, 'learning_rate': 0.0003, 'epoch': 0.07}
{'loss': 0.902, 'learning_rate': 0.0003, 'epoch': 0.08}
{'loss': 0.6552, 'learning_rate': 0.0003, 'epoch': 0.1}
{'loss': 0.7207, 'learning_rate': 0.0003, 'epoch': 0.11}
{'loss': 0.7115, 'learning_rate': 0.0003, 'epoch': 0.12}
{'loss': 0.7682, 'learning_rate': 0.0003, 'epoch': 0.13}
{'loss': 0.7237, 'learning_rate': 0.0003, 'epoch': 0.14}
{'loss': 0.8801, 'learning_rate': 0.0003, 'epoch': 0.15}
{'loss': 0.8988, 'learning_rate': 0.0003, 'epoch': 0.16}
{'loss': 0.8621, 'learning_rate': 0.0003, 'epoch': 0.17}
{'loss': 0.7351, 'learning_rate': 0.0003, 'epoch': 0.18}
{'loss': 0.8254, 'learning_rate': 0.0003, 'epoch': 0.19}
{'loss': 0.777, 'learning_rate': 0.0003, 'epoch': 0.2}
{'train_runtime': 172.7308, 'train_samples_per_second': 0.877, 'train_steps_per_second': 0.11, 'train_loss': 0.9895743319862768, 'epoch': 0.2} 100%|████████████████████████████████████████████████████████████| 19/19 [02:52<00:00, 9.09s/it] Traceback (most recent call last): File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn sock = connection.create_connection( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 491, in _make_request raise new_e File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 467, in _make_request self._validate_conn(conn) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn conn.connect() File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connection.py", line 616, in connect self.sock = sock = self._new_conn() File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connection.py", line 213, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fc4e8111ba0>: Failed to establish a new connection: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen retries = retries.increment( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Qwen/Qwen-7B-Chat-Int4/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc4e8111ba0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/shawn/diska/samba/Train/BIgmode/Qwen-main/finetune.py", line 374, in train() File "/home/shawn/diska/samba/Train/BIgmode/Qwen-main/finetune.py", line 370, in train safe_save_model_for_hf_trainer(trainer=trainer, output_dir=training_args.output_dir, bias=lora_args.lora_bias) File "/home/shawn/diska/samba/Train/BIgmode/Qwen-main/finetune.py", line 122, in safe_save_model_for_hf_trainer trainer._save(output_dir, state_dict=state_dict) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/transformers/trainer.py", line 2865, in _save self.model.save_pretrained( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/peft/peft_model.py", line 216, in save_pretrained output_state_dict = get_peft_model_state_dict( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 146, in get_peft_model_state_dict has_remote_config = file_exists(model_id, "config.json") File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 2386, in file_exists get_hf_file_metadata(url, token=token) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(args, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata r = _request_wrapper( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper response = _request_wrapper( File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper response = get_session().request(method=method, url=url, params) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 67, in send return super().send(request, args, kwargs) File "/home/shawn/diska/anaconda/envs/Qwen/lib/python3.10/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /Qwen/Qwen-7B-Chat-Int4/resolve/main/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fc4e8111ba0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 7c74d1d5-aa02-434e-bf96-e6325061e521)')

QwenLM / Qwen

[BUG] <title> 两个显卡训练，保存模型时出错 #1078

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

finetune.py FAILED