Open kyakuno opened 5 months ago
公式でonnxが提供されるかも。 https://onnxruntime.ai/blogs/accelerating-phi-3
generate apiはpythonで書く必要がある。
onnxruntimeのベータ版であれば下記で動く。
import onnxruntime_genai as og
import argparse
import time
model = og.Model(".\Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()
def input_llm(text):
print("Question:",text)
input_tokens = tokenizer.encode(text)
params = og.GeneratorParams(model)
params.try_use_cuda_graph_with_max_batch_size(1)
params.input_ids = input_tokens
generator = og.Generator(model, params)
return generator
def output_llm(generator):
print("Answer:")
stt = time.time()
list_error = []
list_sentence = []
while not generator.is_done():
generator.compute_logits()
generator.generate_next_token()
new_token = generator.get_next_tokens()[0]
if not new_token in list_error:
try:
list_sentence.append(tokenizer_stream.decode(new_token))
except:
list_error.append(new_token)
list_sentence.append(new_token)
print(list_sentence)
fin = time.time()
print(fin-stt)
return list_error
onnxruntime_genaiのコード。 https://github.com/microsoft/onnxruntime-genai
generateはC++で書かれているので、Pytorch向けの実装を持ってきた方が良さそう。
とりあえずtokenizerはtransformersを使うと良さそう。
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-128k-instruct",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")
messages = [
{"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 500,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])
SentencePieceの一般的なTokenizerに見える。
MicrosoftのminiサイズのLLM。 https://huggingface.co/microsoft/Phi-3-mini-4k-instruct