Closed Pelochus closed 2 months ago
Meta-Llama-3-8B export failed.
from rkllm.api import RKLLM
modelpath = '/data/llm/Meta-Llama-3-8B'
llm = RKLLM()
# Load model
ret = llm.load_huggingface(model = modelpath)
if ret != 0:
print('Load model failed!')
exit(ret)
# Build model
ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8', target_platform='rk3588')
if ret != 0:
print('Build model failed!')
exit(ret)
# Export rknn model
ret = llm.export_rkllm("./Meta-Llama-3-8B.rkllm")
if ret != 0:
print('Export model failed!')
exit(ret)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 4/4 [00:14<00:00, 3.71s/it]
Optimizing model: 100%|██████████████████████████████████████████████████████████████████████████████████| 32/32 [24:12<00:00, 45.38s/it]
Catch exception when converting model!
Export model failed!
Shame, let's see if anyone tries Phi-3, maybe that could work. If it doesn't, then we need to wait for support from Rockchip
v1.0.1 has been released, it supports Phi-3!
Cool! Trying it out soon
Thanks!
@Pelochus Could you please share some benchmarks? How fast is phi3 on RK3588 (or the chip you are using)? What are typical times to the first token and tokens per second?
Hi @kumekay. Currently Phi 3 is partially broken for the people that have tested it (I have 4GBs of RAM so not enough for testing it :S). Aside from that I don't think we have any benchmark for RKLLMs models, so no idea on what the avg tokens/s is for Phi-3.
You can take a look at this videos to make a guesstimate of the avg tokens/s:
Has anyone tried converting Phi 3 mini and Llama 3 8B to sse if they already work with this library?
If not, will they be supported?