TensorStack-AI / OnnxStack

C# Stable Diffusion using ONNX Runtime
Apache License 2.0
220 stars 33 forks source link

StableDiffusion / Non-zero status code returned while running Add node. Name:'Add_221' #142

Open imranypatel opened 5 months ago

imranypatel commented 5 months ago

While trying Basic Stable Diffusion Example of https://www.nuget.org/packages/OnnxStack.StableDiffusion, at the following line in the code:

    // Run Pipleine
    var result = await pipeline.GenerateImageAsync(promptOptions);

Following exception is raised:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException
  HResult=0x80131500
  Message=[ErrorCode:RuntimeException] Non-zero status code returned while running Add node. Name:'Add_221' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2482)\onnxruntime.DLL!00007FFC280E7AA5: (caller: 00007FFC280E712D) Exception(3) tid(1b6c) 80004005 Unspecified error

  Source=Microsoft.ML.OnnxRuntime
  StackTrace:
   at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.<>c__DisplayClass75_0.<RunAsync>b__0(IReadOnlyCollection`1 outputs, IntPtr status)
--- End of stack trace from previous location ---
   at Microsoft.ML.OnnxRuntime.InferenceSession.<RunAsync>d__75.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<EncodePromptTokensAsync>d__39.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GeneratePromptEmbedsAsync>d__40.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<CreatePromptEmbedsAsync>d__37.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<RunInternalAsync>d__31.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GenerateImageAsync>d__26.MoveNext()
   at TestOnnxStack.TestStableDiffusion.<Test01>d__1.MoveNext() in ...\TestOnnxStack\TestStableDiffusion.cs:line 42
   at Program.<<Main>$>d__0.MoveNext() in ...i\TestOnnxStack\Program.cs:line 6

To reproduce:

  1. Create .net8 console project
  2. Add nuget package microsoft.ml.onnxruntime.directml (1.17.3) and onnxstack.stablediffusion (0.31.0)
  3. Copy code in Program.cs from Basic Stable Diffusion Example of https://www.nuget.org/packages/OnnxStack.StableDiffusion documentation.
  4. create d:\model folder to git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -b onnx
  5. change path to model in code e.g. var pipeline = StableDiffusionPipeline.CreatePipeline(@"D:\model\stable-diffusion-v1-5");

Platform

saddam213 commented 5 months ago

That;s an odd looking error coming from deep within

I'll download that model and see if its a regression, been a while since I have used the version of the model

I have this version on disk, and that seems to work ok following your steps https://huggingface.co/TensorStack/stable-diffusion-v1-5-onnx

I will check the other model now and update you with what I find

EDIT:

Downloaded a fresh copy of the model you used and it seemed to work fine

image

Must be another cause, corrupt download? What kind of GPU/Device are you using?

imranypatel commented 5 months ago

Downloaded the model from https://huggingface.co/TensorStack/stable-diffusion-v1-5-onnx

Now getting slightly different error than above:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException
  HResult=0x80131500
  Message=[ErrorCode:RuntimeException] Non-zero status code returned while running Mul node. Name:'' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2482)\onnxruntime.DLL!00007FFC9DD77AA5: (caller: 00007FFC9DD7712D) Exception(3) tid(6c74) 80004005 Unspecified error

  Source=Microsoft.ML.OnnxRuntime
  StackTrace:
   at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.<>c__DisplayClass75_0.<RunAsync>b__0(IReadOnlyCollection`1 outputs, IntPtr status)
--- End of stack trace from previous location ---
   at Microsoft.ML.OnnxRuntime.InferenceSession.<RunAsync>d__75.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<EncodePromptTokensAsync>d__39.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GeneratePromptEmbedsAsync>d__40.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<CreatePromptEmbedsAsync>d__37.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<RunInternalAsync>d__31.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GenerateImageAsync>d__26.MoveNext()
   at TestOnnxStack.TestStableDiffusion.<Test01>d__1.MoveNext() in ...\TestOnnxStack\TestStableDiffusion.cs:line 42
   at Program.<<Main>$>d__0.MoveNext() in ...i\TestOnnxStack\Program.cs:line 6

Device/CPU.:

image

GPU: image

image

Kind of stuck at the moment as just a beginner in this ML.Net/Onnx/directML/etc. space.

Thank you for your support.

saddam213 commented 5 months ago

Unfortunately you GPU may not have enough VRAM for stable diffusion, at minimum you would need 3GB-4GB for a F16 model

You can switch to CPU mode by using the CPU execution provider and see if that works

var pipeline = StableDiffusionPipeline.CreatePipeline(@"D:\models\test\stable-diffusion-v1-5", executionProvider: ExecutionProvider.Cpu);

Its ok, I'm just learning ML too :)

imranypatel commented 5 months ago

I was expecting better error reporting or help for troubleshooting from .Net framework and APIs built upon for ML. Situation seems not as much different from python platforms.

Tried for Execution Provider CPU only to see similar problem; not copying detail here as I think, based on your response, I better first get hardware/software platform in order for ML on windows/,net.

Based on your learning so far, could you suggest/refer to requirements for platform (laptop, cpu, gpu, ram, windows os, etc) as developer to explore ML in general and ML.Net space in particular?

Good luck on your learning ride!

Thank you.

imranypatel commented 5 months ago

Tried on another machine with success.

Typical GPU state during image generation: image

Dxdiag system: image

Dxdiage AMD Radeon GPU: image

It takes on average about 8 minutes per image generation.

Now looking into how to reduce the time, which is of course not accetable at the moment.

Would welcome any suggestion in that direction!