YToleubay commented 1 year ago

1294

kyakuno commented 12 months ago

モデルをアップロードしました。 https://storage.googleapis.com/ailia-models/elyza-japanese-llama-2-7b/decoder_model.onnx

kyakuno commented 12 months ago

macOSだと実時間で処理が終わらない。

kyakuno commented 12 months ago

@YToleubay How many time do you need for inference? About ONNX Runtime and ailia?

YToleubay commented 12 months ago

@YToleubay How many time do you need for inference? About ONNX Runtime and ailia? I did following benchmark with NVIDIA GeForce RTX 3090, 32GB ram, With onnx I have the following output:

processing time 36854 ms
processing time 32836 ms
processing time 31787 ms
processing time 31776 ms
processing time 31774 ms
**Average onnx time is =  33005.4 ms**

with ailia I have the following numbers:

 ailia processing time 1060661 ms
 ailia processing time 1061135 ms 
Average ailia time 1060898 ms

It seems inference runtime is 32 times slower on ailia than onnx

kyakuno commented 12 months ago

Thanks. I will investigate it.

YToleubay commented 12 months ago

Thanks. I will investigate it.

Can I help you somehow?

kyakuno commented 12 months ago

Thank you. We will verify it with the ailia SDK team as it will be the core implementation of the ailia SDK.

axinc-ai / ailia-models

added Japanese LLama elyza #1310

1294