为什么会直接调用CPU呢

xiaohengDa commented 1 year ago

afd94de9545969e83ab9ef2454c588a 我这里为什么会用CPU啊，GPU完全没有调用

xiaohengDa commented 1 year ago

1685605412595 cuda是没问题的

xiaohengDa commented 1 year ago

求指点T-T,这个问题导致我调用LLaMMa-13B的时候直接CPU瞬间爆了

xiaohengDa commented 1 year ago

Some weights of the model checkpoint at C:\Users\Administrator\LangChain-ChatGLM-Webui\model_cache\LLaMA-7B-2M were not used when initializing LlamaModel: ['lm_head.weight']

This IS expected if you are initializing LlamaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing LlamaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 还有就是这里的加载错误是什么原因导致的，加载的是LLamma-7B-2M

xiaohengDa commented 1 year ago

还有个问题，加载模型这里，text2vec-base加载成功了，但chatGLM-6B和LLaMA-7B-2M都加载不了，路径的写法是一致的，实在找不到原因了 embedding_model_dict = { "ernie-tiny": "nghuyong/ernie-3.0-nano-zh", "ernie-base": "nghuyong/ernie-3.0-base-zh", "ernie-medium": "nghuyong/ernie-3.0-medium-zh", "ernie-xbase": "nghuyong/ernie-3.0-xbase-zh", "text2vec-base": "C://Users/Administrator/LangChain-ChatGLM-Webui/model_cache/GanymedeNil/text2vec-base-chinese", 'simbert-base-chinese': 'WangZeJun/simbert-base-chinese', 'paraphrase-multilingual-MiniLM-L12-v2': "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" }

llm_model_dict = { "chatglm": { "ChatGLM-6B": "C://Users/Administrator/LangChain-ChatGLM-Webui/model_cache/chatGLM6b", "ChatGLM-6B-int4": "THUDM/chatglm-6b-int4", "ChatGLM-6B-int8": "THUDM/chatglm-6b-int8", "ChatGLM-6b-int4-qe": "THUDM/chatglm-6b-int4-qe" }, "belle": { "BELLE-LLaMA-Local": "C://Users/Administrator/LangChain-ChatGLM-Webui/model_cache/LLaMA-7B-2M", }, "vicuna": { "Vicuna-Local": "/pretrainmodel/vicuna", } }

thomas-yanxin commented 1 year ago

windows下的路径写法建议参考这里：https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui/issues/36#issuecomment-1566824683

另外，若您存在模型找不到的类似问题，建议您先采用各个模型的推理方法单个推理一下，看是否能推理出来。

比如，chatglm:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("您的ChatGLM-6B模型地址", trust_remote_code=True) 
model = AutoModel.from_pretrained("您的ChatGLM-6B模型地址", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)

如果您用模型提供的推理脚本能跑通但是本项目您无法跑通，则大概率是本项目的问题，欢迎持续提问！

如果您用模型提供的推理脚本就不能跑通，那可能是基础环境的问题或者是其他问题，暂时与本项目无关，此阶段请先自行搜索解决。本项目作者将不予解答。

xiaohengDa commented 1 year ago

windows下的路径写法建议参考这里：#36 (comment)

另外，若您存在模型找不到的类似问题，建议您先采用各个模型的推理方法单个推理一下，看是否能推理出来。

比如，chatglm:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("您的ChatGLM-6B模型地址", trust_remote_code=True) 
model = AutoModel.from_pretrained("您的ChatGLM-6B模型地址", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)
如果您用模型提供的推理脚本能跑通但是本项目您无法跑通，则大概率是本项目的问题，欢迎持续提问！

如果您用模型提供的推理脚本就不能跑通，那可能是基础环境的问题或者是其他问题，暂时与本项目无关，此阶段请先自行搜索解决。本项目作者将不予解答。

还是要请教一下大佬CPU的问题，我的阿里云配置有16G显存，之前单独跑chatGLM6B的推理和ptuning也都没有问题，但为什么最上面一条的问题里就直接调用了CPU，还是我理解有误？但却是13B的LLaMMa一跑起来，我的CPU就爆了

thomas-yanxin commented 1 year ago

windows下的路径写法建议参考这里：#36 (comment) 另外，若您存在模型找不到的类似问题，建议您先采用各个模型的推理方法单个推理一下，看是否能推理出来。比如，chatglm:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("您的ChatGLM-6B模型地址", trust_remote_code=True) 
model = AutoModel.from_pretrained("您的ChatGLM-6B模型地址", trust_remote_code=True).half().cuda()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)
如果您用模型提供的推理脚本能跑通但是本项目您无法跑通，则大概率是本项目的问题，欢迎持续提问！如果您用模型提供的推理脚本就不能跑通，那可能是基础环境的问题或者是其他问题，暂时与本项目无关，此阶段请先自行搜索解决。本项目作者将不予解答。
还是要请教一下大佬CPU的问题，我的阿里云配置有16G显存，之前单独跑chatGLM6B的推理和ptuning也都没有问题，但为什么最上面一条的问题里就直接调用了CPU，还是我理解有误？但却是13B的LLaMMa一跑起来，我的CPU就爆了

你ChatGLM-6B是用的Int4或者int8吧？另外llama-13B你16G的显存应该推不出来吧？当然我不知道你是不是用llama.cpp去推理。

X-D-Lab / LangChain-ChatGLM-Webui

为什么会直接调用CPU呢 #69