Unable to execute GPT-2 onnx model

somasundaram1702 commented 1 month ago

Hello Team,

I am trying to execute the gpt-2 model (link given below) on Mali G710 GPU. During the execution I get the below error,

./ExecuteNetwork -c GpuAcc -f onnx-binary -d /mnt/dropbox/MobileNetV2/llm.txt -m /mnt/dropbox/LLM/gpt2-10.onnx -i input1 -s 1,4,16 Warning: DEPRECATED: The program option 'input-name' is deprecated and will be removed soon. The input-names are now automatical ly set. Warning: DEPRECATED: The program option 'model-format' is deprecated and will be removed soon. The model-format is now automatica lly set. Info: ArmNN v33.1.0 Info: Initialization time: 298.10 ms. Fatal: Datatype INT64 is not valid for tensor 'input1' of node 'Reshape_11', not in {onnx::TensorProto::FLOAT}. at function Pars eReshape [/devenv/armnn/src/armnnOnnxParser/OnnxParser.cpp:2319] Info: Shutdown time: 129.43 ms.

model link: https://github.com/onnx/models/blob/main/validated/text/machine_comprehension/gpt-2/model/gpt2-10.onnx

@FrancisMurtagh-arm: I tried passing both int and float values as input, but still did not help. Can you please suggest a fix.

Colm-in-Arm commented 1 month ago

Hi,

The fatal error message indicates that there are INT64 types in the model. Our Onnx parser does not support this data type. Our ONNX parser is very outdated and has been marked for future deprecation. So unless you're willing to contribute the work yourself, I'm afraid this model won't work.

Colm.

somasundaram1702 commented 1 month ago

@Colm-in-Arm : Does tf-lite support INT64 types or do you recommend any other parsers? My objective is to run inferencing on any of the LLM model on Mali G710 GPU utilizing ExecuteNetwork. Any successful use case available ? If so, kindly can you direct me to the working model ?

Colm-in-Arm commented 1 month ago

Hi,

TfLite runtime does support INT64 in some limited cases. I don't know of other ONNX runtimes you could use.

In Arm NN we have not done any work on LLM's. The work I have seen tends to target the CPU rather than GPU. LLM's tend to be memory bound rather than CPU bound so there's not as much potential for performance increase using GPU's.

Colm.

somasundaram1702 commented 1 month ago

@Colm-in-Arm : Like to inform you that I am able to successfully execute GPT2 tflite model on Mali G710 GPU. The "gpt2-64-fp16.tflite" model worked.

Now ARM can add LLM in to their portfolio :)

Colm-in-Arm commented 1 month ago

Wow! Well done.

Can you outline the steps need to make the model small enough to push through Arm NN? Did you use ExecuteNetwork or your own application? I presume some layers were handled by the TfLite runtime? What kind of inference times were you getting? How about CpuAcc did you try it?

Colm.

somasundaram1702 commented 1 month ago

@Colm-in-Arm : I haven't reduced the size of the model. The file size of "gpt2-64-fp16.tflite" was 248 mb. Like to know why we need to reduce the model size ? Yes, I have used ExecuteNetwork with both CpuAcc & GpuAcc runtimes. CpuAcc - 25 mins and GpuAcc - 2 hours 30 mins (approx).

Note: I am executing the model on a Hybrid emulated platform (using Zebu), where the Cpu is on the virtual side and the Gpu runs on the RTL side. So it is not straight forward to compare the execution times.

Somelayers were handled by TfLite runtime ? How do we verify this? You mean few of the unsupported operations are handled by tflite runtime ?

Also, I would like to check the Gpu core, memory & power consumption. Is there any commands that I can execute from the Linux terminal to check the same ?

@Colm-in-Arm : Awaiting for response

ARM-software / armnn

Unable to execute GPT-2 onnx model #783