FreedomIntelligence / AceGPT

Apache License 2.0
111 stars 7 forks source link

How to use AceGPT in Colab #6

Closed khalil-Hennara closed 9 months ago

khalil-Hennara commented 9 months ago

I want to ask how to use the AceGPT within Colab, I've try to use many ways but nothing works, I've run LLama2 on colab either 7B or 13B using 4bits quantization, but didn't work with AceGPT, I don't know what is the problem, because AceGPT is a fine-tuned version from llama2 please if you could provide a notebook or a code to run the model on colab. The last question why the model have been saved within float32.

Thanks in advance

hhwer commented 9 months ago

Hello, Thank you for reaching out about AceGPT. We haven't specifically tested it on Google Colab, but we're here to help. Could you please share the error messages you're facing with AceGPT on Colab?

Regarding the model precision, we're transitioning from the current fp32 version to an fp16 version to facilitate easier usage and download.

Best regards,

khalil-Hennara commented 9 months ago

Thanks for your response, I didn't get any error, because the model didn't even upload, I can share the code I used to upload the model within 4bits, the same code I used to load Llama2 7B and 13B

`from torch import cuda, bfloat16 import transformers

model_id = 'FreedomIntelligence/AceGPT-13B-chat'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

set quantization configuration to load large model with less GPU memory

this requires the bitsandbytes library

bnb_config = transformers.BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=bfloat16 )

begin initializing HF items, need auth token for these

hf_auth = 'token from hugging face' model_config = transformers.AutoConfig.from_pretrained( model_id, use_auth_token=hf_auth )

model = transformers.AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, config=model_config, quantization_config=bnb_config, device_map='auto', use_auth_token=hf_auth ) model.eval() print(f"Model loaded on {device}")`

the same code work for llama, the token usage is also for llama Thanks in advance.

hhwer commented 9 months ago

Please try our int4 model quanted by Auto-GPTQ. For some reason maybe about the package version, this model can't be used in online HF directly, but it still works with git clone. In Colab, you can try as below:


!pip install transformers==4.32.0
!pip install sentencepiece
!pip3 install auto-gptq==0.4.2
!git clone https://huggingface.co/FreedomIntelligence/AceGPT-7b-chat-GPTQ
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model_id = 'AceGPT-7b-chat-GPTQ'
model = AutoGPTQForCausalLM.from_quantized(model_id,use_safetensors=False)
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="right", use_fast=False)

prompt_dict = {
    'AceGPT': """[INST] <<SYS>>\nأنت مساعد مفيد ومحترم وصادق. أجب دائما بأكبر قدر ممكن من المساعدة بينما تكون آمنا.  يجب ألا تتضمن إجاباتك أي محتوى ضار أو غير أخلاقي أو عنصري أو جنسي أو سام أو خطير أو غير قانوني. يرجى التأكد من أن ردودك غير متحيزة اجتماعيا وإيجابية بطبيعتها.\n\nإذا كان السؤال لا معنى له أو لم يكن متماسكا من الناحية الواقعية، اشرح السبب بدلا من الإجابة على شيء غير صحيح. إذا كنت لا تعرف إجابة سؤال ما، فيرجى عدم مشاركة معلومات خاطئة.\n<</SYS>>\n\n""",
}

role_dict = {
    'AceGPT':['[INST]','[/INST]'],
}

def format_message(query, max_src_len):
    return f"""{prompt_dict["AceGPT"]}{query} {role_dict["AceGPT"][1]}"""

temperature=0.5
max_new_tokens = 768
content_len = 2048
message = 'أين هي عاصمة المملكة العربية السعودية'
history = []
max_src_len = content_len-max_new_tokens-8
prompt = format_message(message, max_src_len)

model_inputs  = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**model_inputs)
output.shape
tokenizer.decode(output[0])
khalil-Hennara commented 9 months ago

Thanks a lot its work very well, thanks for your time and support