SCIR-HI / Huatuo-Llama-Med-Chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调
Apache License 2.0
4.31k stars 422 forks source link

报错BrokenPipeError: [Errno 32] Broken pipe,完整报错如下,请问这是哪里的问题 #78

Open pandazzh2020 opened 10 months ago

pandazzh2020 commented 10 months ago

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

/opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /opt/conda/envs/huatuo did not contain libcudart.so as expected! Searching further paths... warn(msg) /opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/extras/CUPTI/lib64'), PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/cuda/compat/lib'), PosixPath('/usr/local/nvidia/lib64')} warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 114 CUDA SETUP: Loading binary /opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114.so... Training Alpaca-LoRA model with params: base_model: decapoda-research/llama-7b-hf data_path: ./data/Format_data_sheet_mini.json output_dir: ./lora-llama-med-e1 batch_size: 1 micro_batch_size: 1 num_epochs: 5 learning_rate: 0.0003 cutoff_len: 256 val_set_size: 500 lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'v_proj'] train_on_inputs: False group_by_length: False wandb_project: llama_med wandb_run_name: e1 wandb_watch: wandb_log_model: resume_from_checkpoint: False prompt template: med_template

The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:22<00:00, 1.44it/s] The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'. Found cached dataset json (/root/.cache/huggingface/datasets/json/default-51d7aed489ad1911/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 376.91it/s] trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 Loading cached split indices for dataset at /root/.cache/huggingface/datasets/json/default-51d7aed489ad1911/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-9f4de28c7bc88c4b.arrow and /root/.cache/huggingface/datasets/json/default-51d7aed489ad1911/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-b165ac8522c98057.arrow wandb: W&B API key is configured. Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.15.5 wandb: Run data is saved locally in /zheng_zhong_hua/Huatuo-llama-med-chinese/wandb/run-20230722_105219-wnnb9l4i wandb: Run wandb offline to turn off syncing. wandb: Syncing run e1 wandb: ⭐️ View project at https://wandb.ai/chat2023/llama_med wandb: 🚀 View run at https://wandb.ai/chat2023/llama_med/runs/wnnb9l4i 0%| | 0/15535 [00:00<?, ?it/s]/opt/conda/envs/huatuo/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:298: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") 0%| | 2/15535 [00:03<7:23:53, 1.71s/it]wandb: Network error (TransientError), entering retry loop. {'loss': 2.2941, 'learning_rate': 9.65250965250965e-07, 'epoch': 0.0}
{'loss': 2.1901, 'learning_rate': 2.5096525096525096e-06, 'epoch': 0.01}
{'loss': 2.6166, 'learning_rate': 4.054054054054054e-06, 'epoch': 0.01}
0%|▏ | 29/15535 [00:29<4:12:00, 1.03it/s]Exception in thread ChkStopThr: Traceback (most recent call last): File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 917, in run self._target(*self._args, self._kwargs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 273, in check_stop_status Exception in thread NetStatThr: Traceback (most recent call last): self._loop_check_status( File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 980, in _bootstrap_inner File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 211, in _loop_check_status self.run() local_handle = request() File "/opt/conda/envs/huatuo/lib/python3.9/threading.py", line 917, in run File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface.py", line 787, in deliver_stop_status return self._deliver_stop_status(status) self._target(*self._args, *self._kwargs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 585, in _deliver_stop_status File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 255, in check_network_status return self._deliver_record(record) self._loop_check_status( File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 560, in _deliver_record File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 211, in _loop_check_status handle = mailbox._deliver_record(record, interface=self) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record local_handle = request() File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface.py", line 795, in deliver_network_status interface._publish(record) return self._deliver_network_status(status) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 601, in _deliver_network_status return self._deliver_record(record) self._sock_client.send_record_publish(record) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 560, in _deliver_record File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish self.send_server_request(server_req) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request handle = mailbox._deliver_record(record, interface=self) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/mailbox.py", line 455, in _deliver_record self._send_message(msg) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message interface._publish(record) self._sendall_with_error_handle(header + data) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle self._sock_client.send_record_publish(record) sent = self._sock.send(data) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish BrokenPipeError: [Errno 32] Broken pipe self.send_server_request(server_req) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request self._send_message(msg) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message self._sendall_with_error_handle(header + data) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle sent = self._sock.send(data) BrokenPipeError: [Errno 32] Broken pipe 0%|▎ | 32/15535 [00:32<4:04:48, 1.06it/s]Traceback (most recent call last): File "/zheng_zhong_hua/Huatuo-llama-med-chinese/finetune.py", line 289, in fire.Fire(train) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/zheng_zhong_hua/Huatuo-llama-med-chinese/finetune.py", line 279, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 2020, in _inner_training_loop self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 2307, in _maybe_log_save_evaluate self.log(logs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer.py", line 2672, in log self.control = self.callback_handler.on_log(self.args, self.state, self.control, logs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer_callback.py", line 390, in on_log return self.call_event("on_log", args, state, control, logs=logs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/trainer_callback.py", line 397, in call_event result = getattr(callback, event)( File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/transformers/integrations.py", line 814, in on_log self._wandb.log({logs, "train/global_step": state.global_step}) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 389, in wrapper return func(self, *args, *kwargs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 340, in wrapper_fn return func(self, args, kwargs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 330, in wrapper return func(self, *args, **kwargs) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 1745, in log self._log(data=data, step=step, commit=commit) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 1526, in _log self._partial_history_callback(data, step, commit) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/wandb_run.py", line 1396, in _partial_history_callback self._backend.interface.publish_partial_history( File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface.py", line 584, in publish_partial_history self._publish_partial_history(partial_history) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_shared.py", line 89, in _publish_partial_history self._publish(rec) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/interface/interface_sock.py", line 51, in _publish self._sock_client.send_record_publish(record) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 221, in send_record_publish self.send_server_request(server_req) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 155, in send_server_request self._send_message(msg) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 152, in _send_message self._sendall_with_error_handle(header + data) File "/opt/conda/envs/huatuo/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 130, in _sendall_with_error_handle sent = self._sock.send(data) BrokenPipeError: [Errno 32] Broken pipe wandb: While tearing down the service manager. The following error has occurred: [Errno 32] Broken pipe

pandazzh2020 commented 10 months ago

batch_size: 1 调小后未解决 micro_batch_size: 1 num_epochs: 5