Closed xxg98 closed 9 months ago
from datasets import load_dataset traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train') 使用本地电脑,可以翻墙的,使用这个代码下载下来数据集, 可以在本地电脑搜索到下载的位置在 C:\Users\Admin.cache\huggingface\datasets\ptb_text_only这个位置 对应的传到服务器/home/q1/.cache/huggingface/datasets/ptb_text_only这个位置就行了
from datasets import load_dataset traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train') 使用本地电脑,可以翻墙的,使用这个代码下载下来数据集, 可以在本地电脑搜索到下载的位置在 C:\Users\Admin.cache\huggingface\datasets\ptb_text_only这个位置 对应的传到服务器/home/q1/.cache/huggingface/datasets/ptb_text_only这个位置就行了
好滴,谢谢大佬
@xxg98 您好,请问这个问题解决了吗?方便分享一下下载的数据集吗?感谢!不会翻墙,下载不了。
这个问题非常好解决,国内有个平替huggingface模型下载的网站非常好用,而且提供非侵入式的下载方式。如果有模型或者数据需要下载,最简单的方法就是修改一下环境变量就好,不需要科学上网。 如果还需要登录或者借鉴其他的方式,可以参考一下他给的教程。或者看一下这个博客里面的内容blog
scripts as followings, that's all:
export HF_ENDPOINT=https://hf-mirror.com
lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output
@ZiQiangXie @xxg98
这个问题非常好解决,国内有个平替huggingface模型下载的网站非常好用,而且提供非侵入式的下载方式。如果有模型或者数据需要下载,最简单的方法就是修改一下环境变量就好,不需要科学上网。 如果还需要登录或者借鉴其他的方式,可以参考一下他给的教程。或者看一下这个博客里面的内容blog
scripts as followings, that's all:
export HF_ENDPOINT=https://hf-mirror.com lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output
@ZiQiangXie @xxg98
似乎无效。。。。
这个问题非常好解决,国内有个平替huggingface模型下载的网站非常好用,而且提供非侵入式的下载方式。如果有模型或者数据需要下载,最简单的方法就是修改一下环境变量就好,不需要科学上网。 如果还需要登录或者借鉴其他的方式,可以参考一下他给的教程。或者看一下这个博客里面的内容blog
scripts as followings, that's all:
export HF_ENDPOINT=https://hf-mirror.com lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output
@ZiQiangXie @xxg98
在autodl上用这种方法,下载的数据集有点问题,不完整,后面量化报错
Checklist
Describe the bug
请问如何从本地加载这个数据集
就算我复制了一份缓存进去,仍然会有另一个站点连不上: Loading calibrate dataset ... Using the latest cached version of the module from /root/.cache/huggingface/modules/datasets_modules/datasets/ptb_text_only/8d1b97746fb9765d140e569ec5ddd35e20af4d37761f5e1bf357ea0b081f2c1f (last modified on Sat Feb 10 16:50:50 2024) since it couldn't be found locally at ptb_text_only, or remotely on the Hugging Face Hub. Downloading data: 5.10MB [00:14, 349kB/s]
sys.exit(run())
File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 18, in run
args.run(args)
File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/cli/lite.py", line 131, in auto_awq
auto_awq(kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/apis/auto_awq.py", line 54, in auto_awq
model, tokenizer, work_dir = calibrate(model, calib_dataset, calib_samples,
File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/apis/calibrate.py", line 176, in calibrate
calibloader, = get_calib_loaders(calib_dataset,
File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/utils/calib_dataloader.py", line 308, in get_calib_loaders
return get_ptb(tokenizer, nsamples, seed, seqlen)
File "/root/miniconda3/lib/python3.10/site-packages/lmdeploy/lite/utils/calib_dataloader.py", line 58, in get_ptb
traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train')
File "/root/miniconda3/lib/python3.10/site-packages/datasets/load.py", line 2549, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/builder.py", line 1767, in _download_and_prepare
super()._download_and_prepare(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/builder.py", line 1078, in _download_and_prepare
split_generators = self._split_generators(dl_manager, split_generators_kwargs)
File "/root/.cache/huggingface/modules/datasets_modules/datasets/ptb_text_only/8d1b97746fb9765d140e569ec5ddd35e20af4d37761f5e1bf357ea0b081f2c1f/ptb_text_only.py", line 131, in _split_generators
data_dir = dl_manager.download_and_extract(my_urls)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/download/download_manager.py", line 562, in download_and_extract
return self.extract(self.download(url_or_urls))
File "/root/miniconda3/lib/python3.10/site-packages/datasets/download/download_manager.py", line 426, in download
downloaded_path_or_paths = map_nested(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 466, in map_nested
mapped = [
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 467, in
_single_map_nested((function, obj, types, None, True, None))
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 370, in _single_map_nested
return function(data_struct)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/download/download_manager.py", line 451, in _download
out = cached_path(url_or_filename, download_config=download_config)
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 188, in cached_path
output_path = get_from_cache(
File "/root/miniconda3/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 573, in get_from_cache
raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})")
ConnectionError: Couldn't reach https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.test.txt (ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=100)")))
Downloading data: 400kB [00:00, 810kB/s]
Traceback (most recent call last): File "/root/miniconda3/bin/lmdeploy", line 8, in
最终的问题是ptb_text_only会有两个网络问题导致加载不了:一个huggingface,另一个是raw.githubusercontent.com
Reproduction
lmdeploy lite auto_awq /root/autodl-tmp/projects/LLM/fine_tuning/7b/hf_merge/ --w-bits 4 --w-group-size 128 --work-dir ./quant_output
Environment
Error traceback
No response