dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.01k stars 1.88k forks source link

Running AI inference of phi3 and other llms from c# using NPU + GPU in comming processors? #7162

Open agonzalezm opened 4 months ago

agonzalezm commented 4 months ago

Intel, AMD, Qualqomm, etc are getting powerful NPUs (+40TOPS) for inferencing.

Is there any plan to incluide in ml.net functionality to be able to run and inference these models easily from C# offloading to NPU or GPU or both. Next Intel processors will have 40TOPS NPU and 60TOPS CPU/GPU.

How from C# can we easily make the most and inference using all of these TOPS coming from NPU + GPU?

All samples i see about this require using python etc, would be great to have all this available in .NET C# directly.

Maybe including some C# wrapper around https://github.com/intel/intel-npu-acceleration-library but what about AMD and qualcomm?

asmirnov82 commented 4 months ago

Hi @agonzalezm, have you take a look on https://github.com/SciSharp/LLamaSharp project? It allows to run inference for plenty of llm models on consumer level GPUs

agonzalezm commented 3 months ago

I am not asking about inferencing in GPU, but in new NPUs. DirectML says it will support intel NPU but by now not AMD. But asking for easy way to do in c#.

luisquintanilla commented 1 month ago

Generally for ONNX / TorchSharp models, ML.NET is dependent on hardware support by the respective frameworks.

In the ONNX case, what you'd be looking to use is the DirectML or respective hardware vendor's Execution Provider.

Here's an example running an image classification model in ML.NET using the DirectML execution provider.