laoma1234567 commented 5 months ago

您好，xunzi_GLM能够直接用来推理吗？我进行了8bit和4bit量化，model.chat，显示的是GPU利用率在6-8GB左右，半精度跑在11.8GB左右，但是跑了推理了好久，没有结果，三种量化方法推理了两分多钟，一直在推理，没有结果，请问，是否需要下载q-wen chat模型才可以？这是我的推理代码，您看是否有些问题： CommentPicture_44cad668-b0e8-4637-9b50-df801b808aaa 我推测有可能是int4八或者int8量化之后，准确度下降了，所以推理不出来？但我感觉还和我的调用方法有关系？也有可能是我的显卡的问题，我的显卡是3060.....不知道是哪里的问题，希望您能帮我解答，谢谢！

njauzzx commented 5 months ago

这是全部的代码吗，好像并没有推理相关的代码？

Shenxin0925 commented 5 months ago

目前发布的对话模型只有xunzi-qwen-chat，另外的是基座模型，不建议直接用于推理

laoma1234567 commented 5 months ago

from datasets import Dataset from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer tokenizer = AutoTokenizer.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True,padding_side = "left") tokenizer(tokenizer.eos_token), tokenizer.eos_token_id import torch

model = AutoModel.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.half, device_map="auto")

model = AutoModel.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True).half().cuda()

model = AutoModelForCausalLM.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.half, device_map="auto", load_in_4bit=True, bnb_4bit_compute_dtype=torch.half, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True)

model = AutoModelForCausalLM.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True,

                                         #torch_dtype=torch.half, device_map="auto", load_in_8bit=True)

model = AutoModel.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True,

                                         #torch_dtype=torch.half, device_map="auto", load_in_8bit=True)

model = AutoModel.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True).half().cuda()

text=''' 湖北街V曰。[明要晴。b雨要淋。南皮小Ｄ撕Bn晴雨之。IS之gV曰。[明浸早N。Y雨撒t秧。此r民上律天r者也。湖北[明庀罂僧e焉。植涔楚字牧帧⑸搅侄Ｎ┏胁摹Ｆ砉乓泳夂蜓灾Ｒ嗽[明尤宜。附近省城洪山一АＤ归T未渌伞u如S]山P亘省城。σ酝o棠尽８舭洱山亦屯骸ＡW未v。林政未修。林I久h之利。o人^。近qW校官荒o念林。育F亦O公有林。苟qq造林。久t梁可成。水旱可弭矣。吣棺O追h。q必修墓。r春暖。宜郊外之[。呼吸新r空狻u和[颉Ｋ偶S。亦粗具wC原理。合御L之要g焉。四月一日W校春季始I。是以特放春假。湖北各校。往往⒋思倨凇闳牒佟和脎印６嘣诖杭臼I之期。三W期之制。殆有名o。各h署XZ上忙始_征。四月八日中A民本耆闪⒅o念。京hāh族m盛於鹊亍＝穸嘁浦|北。湖北各h北T。多有拱辰拱O之Q。碜o中央。共Dy一焉。 ''' text1=''' 湖北街V曰。[明要晴。b雨要淋。南皮小Ｄ撕Bn晴雨之。IS之gV曰。[明浸早N。Y雨撒t秧。此r民上律天r者也。湖北[明庀罂僧e焉。植涔楚字牧帧⑸搅侄Ｎ┏胁摹Ｆ砉乓泳夂蜓灾Ｒ嗽[明尤宜。附近省城洪山一АＤ归T未渌伞u如S]山P亘省城。σ酝o棠尽８舭洱山亦屯骸ＡW未v。林政未修。林I久h之利。o人^。近qW校官荒o念林。育F亦O公有林。苟qq造林。久t梁可成。 ''' extraction = ['政府机构','人名','地名','事件'] len(extraction) for e, item in enumerate(extraction): template=f''' 请提取文本中的{item}: 以下是原文：{text} 请注意：如果有则输出如下: {item}：原文中提取的内容；如果没有，输出如下：对不起我没有找到对应的{item} ''' response, history = model.chat(tokenizer, template, history=[]) print(response) 最后它的输出是这样的：很抱歉，我没有找到原文中提到的政府机构。原文中提到的内容，可能包括一些地方性的谚语，但并未明确提及政府机构。如果您能提供更多详细信息，我将尽力帮助您查找相关信息。对不起，您没有给出原文，所以我无法提取其中的人名。请提供原文，我会尽力帮助您提取人名。

发件人: Shenxin0925 @.> 发送时间: 2024年1月23日 19:17 收件人: Xunzi-LLM-of-Chinese-classics/XunziALLM @.> 抄送: laoma1234567 @.>; Author @.> 主题: Re: [Xunzi-LLM-of-Chinese-classics/XunziALLM] 老师您好，我想请教一个关于GLM版本的问题 (Issue #4)

目前发布的推理模型只有xunzi-qwen-chat，另外的是基座模型，不建议直接用于推理

― Reply to this email directly, view it on GitHubhttps://github.com/Xunzi-LLM-of-Chinese-classics/XunziALLM/issues/4#issuecomment-1905827551, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A7J6Q6266NRDGLG3H67QDBTYP6L5NAVCNFSM6AAAAABCD34K26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVHAZDONJVGE. You are receiving this because you authored the thread.Message ID: @.***>

xuanyuc commented 3 months ago

from datasets import Dataset from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer tokenizer = AutoTokenizer.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True,padding_side = "left") tokenizer(tokenizer.eos_token), tokenizer.eos_token_id import torch #model = AutoModel.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.half, device_map="auto") #model = AutoModel.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True).half().cuda() model = AutoModelForCausalLM.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.half, device_map="auto", load_in_4bit=True, bnb_4bit_compute_dtype=torch.half, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True) #model = AutoModelForCausalLM.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True, #torch_dtype=torch.half, device_map="auto", load_in_8bit=True) #model = AutoModel.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True, #torch_dtype=torch.half, device_map="auto", load_in_8bit=True) #model = AutoModel.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True).half().cuda() text=''' 湖北�街V曰。�[明要晴。�b雨要淋。南皮��小Ｄ撕Bn晴��雨之。IS之gV曰。�[明浸早N。Y雨撒t秧。此r民上律天�r者也。湖北�[明�庀罂僧�e焉。植�涔�楚字牧帧�⑸搅侄�Ｎ┏胁摹Ｆ�砉乓泳�夂蜓灾Ｒ嗽�[明尤宜。附近省城洪山一АＤ归T未�渌伞�u如S]山P亘省城。��σ酝�o�棠尽８舭洱�山亦屯骸Ａ�W未v。林政未修。林�I久h之利。�o人^��。近�q�W校�官荒�o念林。��育�F亦O公有林。苟�q�q造林。久�t��梁可成。水旱可弭矣。�吣棺�O追h。�q必修墓。�r�春暖。宜�郊外之[。呼吸新r空�狻�u�和[�颉Ｋ偶�S。亦粗具w�C原理。合御L之要g焉。四月一日�W校春季始�I。是以特放春假。湖北各校。往往�⒋思倨凇�闳牒佟�和脎印６嘣诖杭臼�I之期。三�W期之制。殆有名�o��。各h署XZ上忙始_征。四月八日中A民�本��耆闪⒅o念。京�h�ā�h族m盛於�鹊亍＝穸嘁浦�|北。湖北各h北T。多有拱辰拱�O之Q。�碜o中央。共�Dy一焉。 ''' text1=''' 湖北�街V曰。�[明要晴。�b雨要淋。南皮��小Ｄ撕Bn晴��雨之。IS之gV曰。�[明浸早N。Y雨撒t秧。此r民上律天�r者也。湖北�[明�庀罂僧�e焉。植�涔�楚字牧帧�⑸搅侄�Ｎ┏胁摹Ｆ�砉乓泳�夂蜓灾Ｒ嗽�[明尤宜。附近省城洪山一АＤ归T未�渌伞�u如S]山P亘省城。��σ酝�o�棠尽８舭洱�山亦屯骸Ａ�W未v。林政未修。林�I久h之利。�o人^��。近�q�W校�官荒�o念林。��育�F亦O公有林。苟�q�q造林。久�t��梁可成。 ''' extraction = ['政府机构','人名','地名','事件'] len(extraction) for e, item in enumerate(extraction): template=f''' 请提取文本中的{item}: 以下是原文：{text} 请注意：如果有则输出如下: {item}：原文中提取的内容；如果没有，输出如下：对不起我没有找到对应的{item} ''' response, history = model.chat(tokenizer, template, history=[]) print(response) 最后它的输出是这样的：很抱歉，我没有找到原文中提到的政府机构。原文中提到的内容，可能包括一些地方性的谚语，但并未明确提及政府机构。如果您能提供更多详细信息，我将尽力帮助您查找相关信息。对不起，您没有给出原文，所以我无法提取其中的人名。请提供原文，我会尽力帮助您提取人名。 … ____ 发件人: Shenxin0925 @.> 发送时间: 2024年1月23日 19:17 收件人: Xunzi-LLM-of-Chinese-classics/XunziALLM @.> 抄送: laoma1234567 @.>; Author @.> 主题: Re: [Xunzi-LLM-of-Chinese-classics/XunziALLM] 老师您好，我想请教一个关于GLM版本的问题 (Issue #4) 目前发布的推理模型只有xunzi-qwen-chat，另外的是基座模型，不建议直接用于推理 ― Reply to this email directly, view it on GitHub<#4 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A7J6Q6266NRDGLG3H67QDBTYP6L5NAVCNFSM6AAAAABCD34K26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBVHAZDONJVGE. You are receiving this because you authored the thread.Message ID: @.***>

我生成的结果也是这样，生成很不稳定，经常性乱码，胡乱生成，感觉性能很差。请问有什么改进的方法

Xunzi-LLM-of-Chinese-classics / XunziALLM

老师您好，我想请教一个关于GLM版本的问题 #4

model = AutoModel.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.half, device_map="auto")

model = AutoModel.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True).half().cuda()

model = AutoModelForCausalLM.from_pretrained("E:\transformercode/model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True,

model = AutoModel.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True, low_cpu_mem_usage=True,

model = AutoModel.from_pretrained("E:\transformercode\model\Xunzillm4cc\Xunzi-GLM", trust_remote_code=True).half().cuda()