InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.11k stars 280 forks source link

How to quantify deepseek-ai/deepseek-vl-7b-chat #1865

Closed SunnyLee20230523 closed 2 days ago

SunnyLee20230523 commented 2 days ago

感谢作者大大的开源!我在对deepseek-vl-7b-chat进行量化时:lmdeploy lite auto_awq deepseek-ai/deepseek-vl-7b-chat --work-dir deepseek-vl-7b-chat-4bit遇到如下报错:

lmdeploy lite auto_awq deepseek-ai/deepseek-vl-7b-chat --work-dir deepseek-vl-7b-chat-4bit
can't find model from local_path deepseek-ai/deepseek-vl-7b-chat, try to download from remote
Python version is above 3.10, patching the collections module.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.64it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Move model.embed_tokens to GPU.
Move model.layers.0 to CPU.
Move model.layers.1 to CPU.
Move model.layers.2 to CPU.
Move model.layers.3 to CPU.
Move model.layers.4 to CPU.
Move model.layers.5 to CPU.
Move model.layers.6 to CPU.
Move model.layers.7 to CPU.
Move model.layers.8 to CPU.
Move model.layers.9 to CPU.
Move model.layers.10 to CPU.
Move model.layers.11 to CPU.
Move model.layers.12 to CPU.
Move model.layers.13 to CPU.
Move model.layers.14 to CPU.
Move model.layers.15 to CPU.
Move model.layers.16 to CPU.
Move model.layers.17 to CPU.
Move model.layers.18 to CPU.
Move model.layers.19 to CPU.
Move model.layers.20 to CPU.
Move model.layers.21 to CPU.
Move model.layers.22 to CPU.
Move model.layers.23 to CPU.
Move model.layers.24 to CPU.
Move model.layers.25 to CPU.
Move model.layers.26 to CPU.
Move model.layers.27 to CPU.
Move model.layers.28 to CPU.
Move model.layers.29 to CPU.
Move model.norm to GPU.
Move lm_head to CPU.
Loading calibrate dataset ...
Using the latest cached version of the module from /root/autodl-tmp/common_models/modules/datasets_modules/datasets/ptb_text_only/8d1b97746fb9765d140e569ec5ddd35e20af4d37761f5e1bf357ea0b081f2c1f (last modified on Wed Jun 26 18:09:13 2024) since it couldn't be found locally at ptb_text_only, or remotely on the Hugging Face Hub.
Generating train split:   0%|▍                                                                                                                                           | 150/42068 [00:00<00:04, 9717.43 examples/s]
Generating test split: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3761/3761 [00:00<00:00, 41761.05 examples/s]
Generating validation split: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3370/3370 [00:00<00:00, 57097.86 examples/s]
Traceback (most recent call last):
  File "/root/autodl-tmp/conda/envs/lee/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
    args.run(args)
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/lmdeploy/cli/lite.py", line 137, in auto_awq
    auto_awq(**kwargs)
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/lmdeploy/lite/apis/auto_awq.py", line 96, in auto_awq
    vl_model, model, tokenizer, work_dir = calibrate(model,
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/lmdeploy/lite/apis/calibrate.py", line 206, in calibrate
    calib_loader, _ = get_calib_loaders(calib_dataset,
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/lmdeploy/lite/utils/calib_dataloader.py", line 302, in get_calib_loaders
    return get_ptb(tokenizer, nsamples, seed, seqlen)
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/lmdeploy/lite/utils/calib_dataloader.py", line 58, in get_ptb
    traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train')
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/datasets/load.py", line 2582, in load_dataset
    builder_instance.download_and_prepare(
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare
    self._download_and_prepare(
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/datasets/builder.py", line 1767, in _download_and_prepare
    super()._download_and_prepare(
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/datasets/builder.py", line 1118, in _download_and_prepare
    verify_splits(self.info.splits, split_dict)
  File "/root/autodl-tmp/conda/envs/lee/lib/python3.10/site-packages/datasets/utils/info_utils.py", line 101, in verify_splits
    raise NonMatchingSplitsSizesError(str(bad_splits))
datasets.utils.info_utils.NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=5143706, num_examples=42068, shard_lengths=None, dataset_name=None), 'recorded': SplitInfo(name='train', num_bytes=4189, num_examples=150, shard_lengths=None, dataset_name='ptb_text_only')}]

请问是不支持deepseek-vl-7B的量化吗,还是我的操作有问题尼,期待回复!

AllentDan commented 2 days ago

是你的机器没法下载数据集。用 pileval 数据集标定吧,先自行下载 pileval 数据集。然后替换成本地路径。https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/lite/utils/calib_dataloader.py#L250

SunnyLee20230523 commented 2 days ago

是你的机器没法下载数据集。用 pileval 数据集标定吧,先自行下载 pileval 数据集。然后替换成本地路径。https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/lite/utils/calib_dataloader.py#L250

谢谢大佬,我换成本地的模型路径,然后采用了'wikitext2'和'pileval'两个数据集都量化成功了!

AllentDan commented 2 days ago

没问题就关掉了

SunnyLee20230523 commented 2 days ago

没问题就关掉了

okok 谢谢大佬!