RUCAIBox / TextBox

TextBox 2.0 is a text generation library with pre-trained language models
https://github.com/RUCAIBox/TextBox
MIT License
1.07k stars 117 forks source link

下载运行后无法连接huggingface数据集 #363

Closed lz99316 closed 11 months ago

lz99316 commented 11 months ago

描述这个 bug 运行时会报出连接错误MaxRetryError

如何复现 我将整个仓库Pull到本地后使用Pycharm(python 3.10)运行run_textbox.py文件后报错

日志 运行输出如下: F:\文本生成模型TextBox\venv\Scripts\python.exe F:\文本生成模型TextBox\venv\TextBox\run_textbox.py 'wandb' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� 05 Oct 20:08 INFO 65 parameters found.

General Hyper Parameters:

gpu_id: 0 use_gpu: True device: cpu seed: 2020 reproducibility: True cmd: F:\文本生成模型TextBox\venv\TextBox\run_textbox.py filename: BART-samsum-2023-Oct-05_20-08-06 saved_dir: saved/ state: INFO wandb: online

Training Hyper Parameters:

do_train: True do_valid: True optimizer: adamw adafactor_kwargs: {'lr': 0.001, 'scale_parameter': False, 'relative_step': False, 'warmup_init': False} optimizer_kwargs: {} valid_steps: 1 valid_strategy: epoch stopping_steps: 2 epochs: 50 learning_rate: 3e-05 train_batch_size: 4 grad_clip: 0.1 accumulation_steps: 48 disable_tqdm: False resume_training: True

Evaluation Hyper Parameters:

do_test: True lower_evaluation: True multiref_strategy: max bleu_max_ngrams: 4 bleu_type: nltk smoothing_function: 0 corpus_bleu: False rouge_max_ngrams: 2 rouge_type: files2rouge meteor_type: pycocoevalcap chrf_type: m-popovic distinct_max_ngrams: 4 inter_distinct: True unique_max_ngrams: 4 self_bleu_max_ngrams: 4 tgt_lang: en metrics: ['rouge'] eval_batch_size: 16 corpus_meteor: True

Model Hyper Parameters:

model: BART model_name: bart config_kwargs: {} tokenizer_kwargs: {'use_fast': True} generation_kwargs: {'num_beams': 5, 'no_repeat_ngram_size': 3, 'early_stopping': True} efficient_kwargs: {} efficient_methods: [] efficient_unfreeze_model: False label_smoothing: 0.1

Dataset Hyper Parameters:

dataset: samsum data_path: dataset/samsum tgt_lang: en src_len: 1024 tgt_len: 128 truncate: tail metrics_for_best_model: ['rouge-1', 'rouge-2', 'rouge-l'] prefix_prompt: Summarize:

Unrecognized Hyper Parameters:

tokenizer_add_tokens: [] find_unused_parameters: False load_type: from_scratch

================================================================================ '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。')))"), '(Request ID: 39115113-bab9-419b-b03c-e34796e2584e)')' thrown while requesting HEAD https://huggingface.co/None/resolve/main/tokenizer_config.json 05 Oct 20:08 WARNING '(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。')))"), '(Request ID: 39115113-bab9-419b-b03c-e34796e2584e)')' thrown while requesting HEAD https://huggingface.co/None/resolve/main/tokenizer_config.json Traceback (most recent call last): File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connection.py", line 203, in _new_conn sock = connection.create_connection( File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection raise err File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\util\connection.py", line 73, in create_connection sock.connect(sa) ConnectionRefusedError: [WinError 10061] 由于目标计算机积极拒绝,无法连接。

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connectionpool.py", line 776, in urlopen self._prepare_proxy(conn) File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connectionpool.py", line 1041, in _prepare_proxy conn.connect() File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connection.py", line 611, in connect self.sock = sock = self._new_conn() File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connection.py", line 218, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\adapters.py", line 486, in send resp = conn.urlopen( File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen retries = retries.increment( File "F:\文本生成模型TextBox\venv\lib\site-packages\urllib3\util\retry.py", line 515, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "F:\文本生成模型TextBox\venv\TextBox\run_textbox.py", line 15, in run_textbox(model=args.model, dataset=args.dataset, config_file_list=args.config_files, config_dict={}) File "F:\文本生成模型TextBox\venv\TextBox\textbox\quick_start\quick_start.py", line 20, in run_textbox experiment = Experiment(model, dataset, config_file_list, config_dict) File "F:\文本生成模型TextBox\venv\TextBox\textbox\quick_start\experiment.py", line 56, in init self._init_data(self.get_config(), self.accelerator) File "F:\文本生成模型TextBox\venv\TextBox\textbox\quick_start\experiment.py", line 81, in _init_data tokenizer = get_tokenizer(config) File "F:\文本生成模型TextBox\venv\TextBox\textbox\utils\utils.py", line 212, in get_tokenizer tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, tokenizer_kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 686, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 519, in get_tokenizer_config resolved_config_file = cached_file( File "F:\文本生成模型TextBox\venv\lib\site-packages\transformers\utils\hub.py", line 429, in cached_file resolved_file = hf_hub_download( File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn return fn(*args, kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 1232, in hf_hub_download metadata = get_hf_file_metadata( File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn return fn(args, kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 1599, in get_hf_file_metadata r = _request_wrapper( File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 417, in _request_wrapper response = _request_wrapper( File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\file_download.py", line 452, in _request_wrapper return http_backoff( File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_http.py", line 274, in http_backoff raise err File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_http.py", line 258, in http_backoff response = session.request(method=method, url=url, kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\sessions.py", line 703, in send r = adapter.send(request, kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\huggingface_hub\utils_http.py", line 63, in send return super().send(request, args, kwargs) File "F:\文本生成模型TextBox\venv\lib\site-packages\requests\adapters.py", line 513, in send raise ProxyError(e, request=request) requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /None/resolve/main/tokenizer_config.json (Caused by ProxyError('Unable to connect to proxy', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000021EB426B880>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。')))"), '(Request ID: 39115113-bab9-419b-b03c-e34796e2584e)')

Process finished with exit code 1

StevenTang1998 commented 11 months ago

这是代理相关的问题,你可以尝试在代码中使用代理解决这个问题,或者把模型下载后再使用也可以。

lz99316 commented 11 months ago

这是代理相关的问题,你可以尝试在代码中使用代理解决这个问题,或者把模型下载后再使用也可以。

如何在代码中使用代理呀T T

StevenTang1998 commented 11 months ago

建议百度一下吧,或者把模型下载后再使用也可以。

lz99316 commented 11 months ago

建议百度一下吧,或者把模型下载后再使用也可以。

感谢

lz99316 commented 11 months ago

建议百度一下吧,或者把模型下载后再使用也可以。

https://huggingface.co/None/resolve/main/tokenizer_config.json,这个url通过代理访问现在也是没有的

StevenTang1998 commented 11 months ago

https://github.com/RUCAIBox/TextBox#quick-start

运行命令有误,请认真阅读

lz99316 commented 11 months ago

https://github.com/RUCAIBox/TextBox#quick-start

运行命令有误,请认真阅读

如果是在pycharmIDE打开了run_textbox.py了,需要改哪些才能和在命令行操作等价阿?(小白诚信发问,研究了好久也不知道在py文件里面模仿cmd的操作,模型和数据集还好说,改default的值就好,那个model_path真不知道改哪了,还有run_textbox.py里那个config_files,那个需要改么,配置文件是不是都已经下载好了(呜呜))

StevenTang1998 commented 11 months ago

可以在config_dict={'model_path': 'xxx'}中添加,不过还是建议学习如何使用cmd运行