72B模型量化需要多大内存，192G的内存都会被kill掉

sweetcard commented 7 months ago

请问有大佬知道吗？

syslot commented 7 months ago

also want to know

sweetcard commented 7 months ago

估计要超过200G的内存才行。llama.cpp已经支持了，可以通过temp文件的方式，而不是用内存的方式进行量化了。

bigbigtooth commented 7 months ago

我们什么时候可以用上呢？

sweetcard commented 7 months ago

我们什么时候可以用上呢？

去下载自己量化就可以了

bigbigtooth commented 7 months ago

去下载自己量化就可以了

哦？这个是可以兼容的吗？我马上试试

sweetcard commented 7 months ago

去下载自己量化就可以了

哦？这个是可以兼容的吗？我马上试试

直接用llama.cpp量化就OK了

bigbigtooth commented 7 months ago

去下载自己量化就可以了

哦？这个是可以兼容的吗？我马上试试

直接用llama.cpp量化就OK了

请教：用llama.cpp量化时报错：

Traceback (most recent call last): File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1228, in main() File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1161, in main model_plus = load_some_model(args.model) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1078, in load_some_model model_plus = merge_multifile_models(models_plus) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 593, in merge_multifile_models model = merge_sharded([mp.model for mp in models_plus]) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in merge_sharded return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in convert lazy_tensors: list[LazyTensor] = [model[name] for model in models] File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in lazy_tensors: list[LazyTensor] = [model[name] for model in models] KeyError: 'transformer.h.0.attn.c_attn.bias'

pip list :

Package Version

Brotli 1.1.0 certifi 2023.11.17 charset-normalizer 3.3.2 contourpy 1.2.0 cycler 0.12.1 einops 0.7.0 filelock 3.13.1 fonttools 4.46.0 fsspec 2023.10.0 gguf 0.5.1 gmpy2 2.1.2 huggingface-hub 0.19.4 idna 3.6 Jinja2 3.1.2 kiwisolver 1.4.5 MarkupSafe 2.1.3 matplotlib 3.8.2 mpmath 1.3.0 networkx 3.2.1 numpy 1.24.4 packaging 23.2 Pillow 10.1.0 pip 23.3.1 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 PyYAML 6.0.1 qwen-cpp 0.1.2 regex 2023.10.3 requests 2.31.0 safetensors 0.4.1 sentencepiece 0.1.98 setuptools 68.2.2 six 1.16.0 sympy 1.12 tabulate 0.9.0 tiktoken 0.5.1 tokenizers 0.15.0 torch 2.2.0.dev20231130 torchaudio 2.2.0.dev20231130 torchvision 0.17.0.dev20231130 tqdm 4.66.1 transformers 4.35.2 transformers-stream-generator 0.0.4 typing_extensions 4.8.0 urllib3 2.1.0 wheel 0.42.0

是我装错了什么库吗？？？没搞懂？？？

sweetcard commented 7 months ago

去下载自己量化就可以了

哦？这个是可以兼容的吗？我马上试试

直接用llama.cpp量化就OK了

请教：用llama.cpp量化时报错：

Traceback (most recent call last): File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1228, in main() File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1161, in main model_plus = load_some_model(args.model) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 1078, in load_some_model model_plus = merge_multifile_models(models_plus) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 593, in merge_multifile_models model = merge_sharded([mp.model for mp in models_plus]) File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in merge_sharded return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 572, in return {name: convert(name) for name in names} File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in convert lazy_tensors: list[LazyTensor] = [model[name] for model in models] File "/Users/xxx/AIproject/llama.cpp/convert.py", line 547, in lazy_tensors: list[LazyTensor] = [model[name] for model in models] KeyError: 'transformer.h.0.attn.c_attn.bias'

pip list :

Package Version

Brotli 1.1.0 certifi 2023.11.17 charset-normalizer 3.3.2 contourpy 1.2.0 cycler 0.12.1 einops 0.7.0 filelock 3.13.1 fonttools 4.46.0 fsspec 2023.10.0 gguf 0.5.1 gmpy2 2.1.2 huggingface-hub 0.19.4 idna 3.6 Jinja2 3.1.2 kiwisolver 1.4.5 MarkupSafe 2.1.3 matplotlib 3.8.2 mpmath 1.3.0 networkx 3.2.1 numpy 1.24.4 packaging 23.2 Pillow 10.1.0 pip 23.3.1 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 PyYAML 6.0.1 qwen-cpp 0.1.2 regex 2023.10.3 requests 2.31.0 safetensors 0.4.1 sentencepiece 0.1.98 setuptools 68.2.2 six 1.16.0 sympy 1.12 tabulate 0.9.0 tiktoken 0.5.1 tokenizers 0.15.0 torch 2.2.0.dev20231130 torchaudio 2.2.0.dev20231130 torchvision 0.17.0.dev20231130 tqdm 4.66.1 transformers 4.35.2 transformers-stream-generator 0.0.4 typing_extensions 4.8.0 urllib3 2.1.0 wheel 0.42.0

是我装错了什么库吗？？？没搞懂？？？

要用这个脚本： convert-hf-to-gguf.py

bigbigtooth commented 7 months ago

哈哈哈，果然可行，感谢。

72b的推理速度真的慢

QwenLM / qwen.cpp

72B模型量化需要多大内存，192G的内存都会被kill掉 #47