Azure-Samples / Phi-3MiniSamples

Discover how phi3-mini, a new series of models from Microsoft, enables deployment of Large Language Models (LLMs) on edge devices and IoT devices. Learn how to use Semantic Kernel, Ollama/LlamaEdge, and ONNX Runtime to access and infer phi3-mini models, and explore the possibilities of generative AI in various application scenarios
MIT License
69 stars 16 forks source link

Performance for Phi-3Mini Ios Sample #9

Closed YUNQIUGUO closed 5 months ago

YUNQIUGUO commented 6 months ago

Hi!

Thanks for contributing all these work for Phi-3 model samples!

I tried the iOS sample from here and followed the instructions here:

https://github.com/Azure-Samples/Phi-3MiniSamples/tree/main/ios

However, in my local testing on an iphone 12 device, it seems like taking way long to generate a simple sentence. (the default prompt question in your app can take more than 10 minutes.)

Not sure if it is an expected behavior/you also see similar things on your end?

Thanks!

scovin1109 commented 6 months ago

I also have the same situation with my iphone 14 pro。 It takes at least 20 mins as my prompt is "hi".

stleon commented 6 months ago

It took ~5minutes on iPad Pro M1

leestott commented 5 months ago

Hi for this to work you need to be using at least a iPhone 14 or a device with a A16 chipset

leestott commented 5 months ago

updated prereqs to state minimum of iPhone 14 with A16 chipset https://github.com/Azure-Samples/Phi-3MiniSamples/pull/10

This works best on a A17 iPhone 15 Pro or ProMax

leestott commented 5 months ago

issues closed @stleon @scovin1109 @YUNQIUGUO issue is due to age of device apple iPhone only support AI Processor support with A16 or A17 chipsets iPhone 14 or iPhone 15 Pro models

YUNQIUGUO commented 5 months ago

FYI, he issue may not only be caused by not using the latest sets of iOS devices:

Upon further investigation on ORT side, looks like it's because previously we are not correctly detecting this "HasDotProductInstructions" support in our mlas platform cpu info: https://github.com/microsoft/onnxruntime/blob/e81c8676e3001c0c148b2d5495f90d048b2c9480/onnxruntime/core/mlas/lib/platform.cpp#L517 thus further causing not utilizing the MlasSQNBitGemm optimization at all.

We are working on an official fix will soon be out.

And with an updated local build, we are able to achieve similar results (vs. android) on an iphone 12 device with about 9~11 tokens/second - assuming would have even better perf for newer sets of devices.

YUNQIUGUO commented 5 months ago

this is our fix branch: Add CPUIDInfo::ArmAppleInit() to detect CPU features on Apple platforms. · microsoft/onnxruntime@e651436 (github.com) hopefully will soon go out with a patch ORT release