InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.67k stars 427 forks source link

4bit量化时ptb_text_only在连接huggingface时无法下载[Bug] #1142

Closed xxg98 closed 9 months ago

xxg98 commented 9 months ago

Checklist

Describe the bug

e341993d05015208d204a26cd2ad4b1

请问如何从本地加载这个数据集

就算我复制了一份缓存进去,仍然会有另一个站点连不上: Loading calibrate dataset ... Using the latest cached version of the module from /root/.cache/huggingface/modules/datasets_modules/datasets/ptb_text_only/8d1b97746fb9765d140e569ec5ddd35e20af4d37761f5e1bf357ea0b081f2c1f (last modified on Sat Feb 10 16:50:50 2024) since it couldn't be found locally at ptb_text_only, or remotely on the Hugging Face Hub. Downloading data: 5.10MB [00:14, 349kB/s]
Downloading data: 400kB [00:00, 810kB/s]
Traceback (most recent call last): File "/root/miniconda3/bin/lmdeploy", line 8, in sys.exit(run()) File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 18, in run args.run(args) File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/cli/lite.py", line 131, in auto_awq auto_awq(kwargs) File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/apis/auto_awq.py", line 54, in auto_awq model, tokenizer, work_dir = calibrate(model, calib_dataset, calib_samples, File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/apis/calibrate.py", line 176, in calibrate calibloader, = get_calib_loaders(calib_dataset, File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/utils/calib_dataloader.py", line 308, in get_calib_loaders return get_ptb(tokenizer, nsamples, seed, seqlen) File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/utils/calib_dataloader.py", line 58, in get_ptb traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train') File "/root/miniconda3/lib/python3.10/site-packages/datasets/load.py", line 2549, in load_dataset builder_instance.download_and_prepare( File "/root/miniconda3/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare self._download_and_prepare( File "/root/miniconda3/lib/python3.10/site-packages/datasets/builder.py", line 1767, in _download_and_prepare super()._download_and_prepare( File "/root/miniconda3/lib/python3.10/site-packages/datasets/builder.py", line 1078, in _download_and_prepare split_generators = self._split_generators(dl_manager, split_generators_kwargs) File "/root/.cache/huggingface/modules/datasets_modules/datasets/ptb_text_only/8d1b97746fb9765d140e569ec5ddd35e20af4d37761f5e1bf357ea0b081f2c1f/ptb_text_only.py", line 131, in _split_generators data_dir = dl_manager.download_and_extract(my_urls) File "/root/miniconda3/lib/python3.10/site-packages/datasets/download/download_manager.py", line 562, in download_and_extract return self.extract(self.download(url_or_urls)) File "/root/miniconda3/lib/python3.10/site-packages/datasets/download/download_manager.py", line 426, in download downloaded_path_or_paths = map_nested( File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 466, in map_nested mapped = [ File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 467, in _single_map_nested((function, obj, types, None, True, None)) File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 370, in _single_map_nested return function(data_struct) File "/root/miniconda3/lib/python3.10/site-packages/datasets/download/download_manager.py", line 451, in _download out = cached_path(url_or_filename, download_config=download_config) File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 188, in cached_path output_path = get_from_cache( File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 573, in get_from_cache raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})") ConnectionError: Couldn't reach https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.test.txt (ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=100)")))

最终的问题是ptb_text_only会有两个网络问题导致加载不了:一个huggingface,另一个是raw.githubusercontent.com

Reproduction

lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output

Environment

python=3.10
cuda=12.1
显卡:3090
lmdeploy:最新版本
internlm2-chat-7b:最新版本
hf_merge:微调internlm2-chat-7b并合并后的路径

Error traceback

No response

Dingxiangxiang commented 9 months ago

from datasets import load_dataset traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train') 使用本地电脑,可以翻墙的,使用这个代码下载下来数据集, 可以在本地电脑搜索到下载的位置在 C:\Users\Admin.cache\huggingface\datasets\ptb_text_only这个位置 对应的传到服务器/home/q1/.cache/huggingface/datasets/ptb_text_only这个位置就行了

xxg98 commented 9 months ago

from datasets import load_dataset traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train') 使用本地电脑,可以翻墙的,使用这个代码下载下来数据集, 可以在本地电脑搜索到下载的位置在 C:\Users\Admin.cache\huggingface\datasets\ptb_text_only这个位置 对应的传到服务器/home/q1/.cache/huggingface/datasets/ptb_text_only这个位置就行了

好滴,谢谢大佬

ZiQiangXie commented 8 months ago

@xxg98 您好,请问这个问题解决了吗?方便分享一下下载的数据集吗?感谢!不会翻墙,下载不了。

ztfmars commented 6 months ago

这个问题非常好解决,国内有个平替huggingface模型下载的网站非常好用,而且提供非侵入式的下载方式。如果有模型或者数据需要下载,最简单的方法就是修改一下环境变量就好,不需要科学上网。 如果还需要登录或者借鉴其他的方式,可以参考一下他给的教程。或者看一下这个博客里面的内容blog

scripts as followings, that's all:

export HF_ENDPOINT=https://hf-mirror.com 
lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output

@ZiQiangXie @xxg98

ysyx2008 commented 5 months ago

这个问题非常好解决,国内有个平替huggingface模型下载的网站非常好用,而且提供非侵入式的下载方式。如果有模型或者数据需要下载,最简单的方法就是修改一下环境变量就好,不需要科学上网。 如果还需要登录或者借鉴其他的方式,可以参考一下他给的教程。或者看一下这个博客里面的内容blog

scripts as followings, that's all:

export HF_ENDPOINT=https://hf-mirror.com 
lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output

@ZiQiangXie @xxg98

似乎无效。。。。

liguoyu666 commented 1 month ago

这个问题非常好解决,国内有个平替huggingface模型下载的网站非常好用,而且提供非侵入式的下载方式。如果有模型或者数据需要下载,最简单的方法就是修改一下环境变量就好,不需要科学上网。 如果还需要登录或者借鉴其他的方式,可以参考一下他给的教程。或者看一下这个博客里面的内容blog

scripts as followings, that's all:

export HF_ENDPOINT=https://hf-mirror.com 
lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output

@ZiQiangXie @xxg98

在autodl上用这种方法,下载的数据集有点问题,不完整,后面量化报错