axinc-ai / ailia-models

The collection of pre-trained, state-of-the-art AI models for ailia SDK
2k stars 319 forks source link

Add phi3-mini #1461

Open kyakuno opened 5 months ago

kyakuno commented 5 months ago

MicrosoftのminiサイズのLLM。 https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

kyakuno commented 5 months ago

公式でonnxが提供されるかも。 https://onnxruntime.ai/blogs/accelerating-phi-3

kyakuno commented 5 months ago

公式でonnxが提供された。 https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx

kyakuno commented 5 months ago

generate apiはpythonで書く必要がある。

kyakuno commented 5 months ago

推論コードの例。 https://github.com/microsoft/onnxruntime/issues/20448

kyakuno commented 5 months ago

onnxruntimeのベータ版であれば下記で動く。

import onnxruntime_genai as og
import argparse
import time

model = og.Model(".\Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

def input_llm(text):
    print("Question:",text)
    input_tokens = tokenizer.encode(text)
    params = og.GeneratorParams(model)
    params.try_use_cuda_graph_with_max_batch_size(1)
    params.input_ids = input_tokens
    generator = og.Generator(model, params)
    return generator

def output_llm(generator):
    print("Answer:")
    stt = time.time()
    list_error = []
    list_sentence = []
    while not generator.is_done():
        generator.compute_logits()
        generator.generate_next_token()
        new_token = generator.get_next_tokens()[0]
        if not new_token in list_error:
            try:
                list_sentence.append(tokenizer_stream.decode(new_token))
            except:
                list_error.append(new_token)
                list_sentence.append(new_token)
    print(list_sentence)
    fin = time.time()
    print(fin-stt)
    return list_error
kyakuno commented 5 months ago

onnxruntime_genaiのコード。 https://github.com/microsoft/onnxruntime-genai

kyakuno commented 5 months ago

generateはC++で書かれているので、Pytorch向けの実装を持ってきた方が良さそう。

kyakuno commented 5 months ago

とりあえずtokenizerはtransformersを使うと良さそう。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct", 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

messages = [
    {"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

kyakuno commented 5 months ago

文章生成はとりあえずgreedy searchとか。 https://github.com/axinc-ai/ailia-models/blob/master/natural_language_processing/rinna_gpt2/utils_rinna_gpt2.py

kyakuno commented 5 months ago

LlamaTokenizerを使っている。 https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer_config.json

kyakuno commented 5 months ago

LlamaTokenizer https://github.com/huggingface/transformers/blob/37fa1f654f17b68bbe30440c64e611f1a4d55bc7/src/transformers/models/llama/tokenization_llama.py#L55

kyakuno commented 5 months ago

SentencePieceの一般的なTokenizerに見える。