Phi 3 and Llama 3 compatibility?

Pelochus commented 2 months ago

Has anyone tried converting Phi 3 mini and Llama 3 8B to sse if they already work with this library?

If not, will they be supported?

yangtuo250 commented 2 months ago

Meta-Llama-3-8B export failed.

from rkllm.api import RKLLM

modelpath = '/data/llm/Meta-Llama-3-8B'
llm = RKLLM()

# Load model
ret = llm.load_huggingface(model = modelpath)
if ret != 0:
    print('Load model failed!')
    exit(ret)

# Build model
ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8', target_platform='rk3588')
if ret != 0:
    print('Build model failed!')
    exit(ret)

# Export rknn model
ret = llm.export_rkllm("./Meta-Llama-3-8B.rkllm")
if ret != 0:
    print('Export model failed!')
    exit(ret)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 4/4 [00:14<00:00,  3.71s/it]
Optimizing model: 100%|██████████████████████████████████████████████████████████████████████████████████| 32/32 [24:12<00:00, 45.38s/it]
Catch exception when converting model!
Export model failed!

Pelochus commented 2 months ago

Shame, let's see if anyone tries Phi-3, maybe that could work. If it doesn't, then we need to wait for support from Rockchip

waydong commented 2 months ago

v1.0.1 has been released, it supports Phi-3！

Pelochus commented 2 months ago

Cool! Trying it out soon

Thanks!

kumekay commented 2 months ago

@Pelochus Could you please share some benchmarks? How fast is phi3 on RK3588 (or the chip you are using)? What are typical times to the first token and tokens per second?

Pelochus commented 2 months ago

Hi @kumekay. Currently Phi 3 is partially broken for the people that have tested it (I have 4GBs of RAM so not enough for testing it :S). Aside from that I don't think we have any benchmark for RKLLMs models, so no idea on what the avg tokens/s is for Phi-3.

You can take a look at this videos to make a guesstimate of the avg tokens/s:

Will be faster than this for sure (around twice as fast I would say): https://www.reddit.com/r/RockchipNPU/comments/1ci7p72/rk3588_running_llama2_7b/
And slower than this, but not by much (yeah, it was partially broken): https://www.reddit.com/r/RockchipNPU/comments/1c9kc1v/phi2_running_on_4gb_of_ram_first_try_driver_096/

airockchip / rknn-llm

Phi 3 and Llama 3 compatibility? #28