CPU version runs but DirectML version throws exception

AshD commented 11 months ago

Using the DirectML branch, Latest Visual Studio, Latest Windows 11

Microsoft.ML.OnnxRuntime.OnnxRuntimeException HResult=0x80131500 Message=[ErrorCode:RuntimeException] Non-zero status code returned while running Trilu node. Name:'Trilu_233' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2448)\onnxruntime.DLL!00007FFD1ED55645: (caller: 00007FFD1ED54CDA) Exception(3) tid(4b4) 80070057 The parameter is incorrect.

Source=Microsoft.ML.OnnxRuntime StackTrace: at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus) at Microsoft.ML.OnnxRuntime.InferenceSession.RunImpl(RunOptions options, IntPtr[] inputNames, IntPtr[] inputValues, IntPtr[] outputNames) at Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection1 inputs, IReadOnlyCollection1 outputNames, RunOptions options) at Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection1 inputs, IReadOnlyCollection1 outputNames) at Microsoft.ML.OnnxRuntime.InferenceSession.Run(IReadOnlyCollection`1 inputs) at StableDiffusion.ML.OnnxRuntime.TextProcessing.TextEncoder(Int32[] tokenizedInput, StableDiffusionConfig config) in D:\ai\StableDiffusion\StableDiffusion.ML.OnnxRuntime\TextProcessing.cs:line 85 at StableDiffusion.ML.OnnxRuntime.TextProcessing.PreprocessText(String prompt, StableDiffusionConfig config) in D:\ai\StableDiffusion\StableDiffusion.ML.OnnxRuntime\TextProcessing.cs:line 12 at StableDiffusion.ML.OnnxRuntime.UNet.Inference(String prompt, StableDiffusionConfig config) in D:\ai\StableDiffusion\StableDiffusion.ML.OnnxRuntime\UNet.cs:line 74 at StableDiffusion.Program.Main(String[] args) in D:\ai\StableDiffusion\StableDiffusion\Program.cs:line 37

AshD commented 11 months ago

I am using an RTX 3070. Cuda is also installed and used by python programs.

AshD commented 11 months ago

I downloaded a zipped version of the repository DirectML branch and it worked.

It's very slow compared to the Python Diffusers. Took 30 seconds to generate the image using a RTX 3090, whereas python diffusers took 3 seconds.

cassiebreviu / StableDiffusion

CPU version runs but DirectML version throws exception #23