Running AI inference of phi3 and other llms from c# using NPU + GPU in comming processors?

agonzalezm commented 4 months ago

Intel, AMD, Qualqomm, etc are getting powerful NPUs (+40TOPS) for inferencing.

Is there any plan to incluide in ml.net functionality to be able to run and inference these models easily from C# offloading to NPU or GPU or both. Next Intel processors will have 40TOPS NPU and 60TOPS CPU/GPU.

How from C# can we easily make the most and inference using all of these TOPS coming from NPU + GPU?

All samples i see about this require using python etc, would be great to have all this available in .NET C# directly.

Maybe including some C# wrapper around https://github.com/intel/intel-npu-acceleration-library but what about AMD and qualcomm?

asmirnov82 commented 4 months ago

Hi @agonzalezm, have you take a look on https://github.com/SciSharp/LLamaSharp project? It allows to run inference for plenty of llm models on consumer level GPUs

agonzalezm commented 3 months ago

I am not asking about inferencing in GPU, but in new NPUs. DirectML says it will support intel NPU but by now not AMD. But asking for easy way to do in c#.

luisquintanilla commented 1 month ago

Generally for ONNX / TorchSharp models, ML.NET is dependent on hardware support by the respective frameworks.

In the ONNX case, what you'd be looking to use is the DirectML or respective hardware vendor's Execution Provider.

Here's an example running an image classification model in ML.NET using the DirectML execution provider.

dotnet / machinelearning

Running AI inference of phi3 and other llms from c# using NPU + GPU in comming processors? #7162