0xC0000054 / pdn-ddsfiletype-plus

A Paint.NET filetype plugin that adds support for some of the DDS formats introduced in DirectX 10 and later.
https://forums.getpaint.net/topic/111731-dds-filetype-plus/
MIT License
69 stars 5 forks source link

BC7 encoding with WARP can freeze the Paint.NET UI #11

Closed 0xC0000054 closed 3 years ago

0xC0000054 commented 3 years ago

When encoding larger images with the WARP DirectCompute device the Paint.NET UI stops responding when the progress reaches a certain point. WARP batches its rendering calls, and after it starts rendering the SaveConfigDialog will freeze if you try to interact with it.

Edit: The UI freeze occurs regardless of whether D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS is set, so I am not sure what could be causing it.

I observed the SaveConfigDialog being marked as not responding in task manager when using a 4096x4096 pixel blank canvas. On a 1024x1024 pixel blank canvas the SaveConfigDialog does not fully paint until the image has been compressed, it is not blocked long enough on my system to show up as not responding in task manager.

Screenshots with a blank 4096x4096 pixel canvas ![UI not responding](https://user-images.githubusercontent.com/26996983/95667966-94eca200-0b2a-11eb-8c08-f6d1c67194cf.png) ![PDN task manager](https://user-images.githubusercontent.com/26996983/95667957-7ab2c400-0b2a-11eb-9f29-aa6d61364c11.png)

Paint.NET diagnostics:

Application paint.net 4.2.13 (Final 4.213.7521.38873)
Build Date  Tuesday, August 4, 2020
Install type    Classic

Hardware accelerated rendering (GPU)    False
Animations  True
DPI 96 (1.00x scale)
Language    en-US

OS  Windows 10 Pro x64 (10.0.19041.0) (0x30)
.NET Runtime    4.0.30319.42000
Physical Memory 32,709 MB

CPU Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
    Speed   ~3998 MHz
    Cores / Threads 4 / 8
    Features    SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, AVX, AVX2

Video Card  NVIDIA GeForce RTX 2070 SUPER
    Dedicated Video RAM 8,011 MB
    Dedicated System RAM    0 MB
    Shared System RAM   16,354 MB
    Vendor ID   0x10DE
    Device ID   0x1E84
    Subsystem ID    0x31733842
    Revision    161
    LUID    0x00009E4C
    Flags   AcgCompatible, SupportMonitoredFences, KeyedMutexConformance
    Graphics Preemption PixelBoundary
    Compute Preemption  DispatchBoundary
    Outputs 2
    Feature Level   Direct3D_12_1
    DXGI Formats    A8_UNorm, B8G8R8A8_UNorm, R16G16B16A16_UNorm, R16G16B16A16_Float, R32G32B32A32_Float
    Buffer Precision    UNorm8bpc, UNorm8bpcSrgb, UNorm16bpc, Float16bpc, Float32bpc

Video Card  Microsoft Basic Render Driver
    Dedicated Video RAM 0 MB
    Dedicated System RAM    0 MB
    Shared System RAM   16,354 MB
    Vendor ID   0x1414
    Device ID   0x008C
    Subsystem ID    0x00000000
    Revision    0
    LUID    0x0000AAB7
    Flags   Software, AcgCompatible, SupportMonitoredFences, KeyedMutexConformance
    Graphics Preemption InstructionBoundary
    Compute Preemption  InstructionBoundary
    Outputs 0
    Feature Level   Direct3D_12_1
    DXGI Formats    A8_UNorm, B8G8R8A8_UNorm, R16G16B16A16_UNorm, R16G16B16A16_Float, R32G32B32A32_Float
    Buffer Precision    UNorm8bpc, UNorm8bpcSrgb, UNorm16bpc, Float16bpc, Float32bpc

Unfortunately, the DirectXTex CPU encoder for BC7 is extremely slow so it is not a viable alternative when compressing large images.

cc @rickbrew

rickbrew commented 3 years ago

From what I've seen, WARP uses its own thread pool -- or at least, I've never seen those threads being used for any other work. That thread pool appears to use a simple queue for work items, rather than a queue-of-work-queues that would permit round-robin scheduling. So if some task queues up a ton of work, and then another task queues up some small work, the latter just has to wait around for the first to finish.

You can recreate a similar kind of UI-thread stalling in PDN by doing something like this:

  1. Disable hardware acceleration in Settings
  2. Open a really large image. It should be a little under half of your total memory, so ~14 or 15 GB in your case
  3. Switch to the Move Selected Pixels tool
  4. Select "Bicubic" resampling in the toolbar. The rendering for this is implemented using Direct2D, and since this code is always run in software mode it also always uses WARP.
  5. Select All, and then rotate the pixels around a play with it.

It takes awhile for all of the tiles to render. You will probably see UI elements -- the grab handles on the canvas, as well as general UI fps and status bar updates -- not working very well. The UI thread is using WARP, as are all the background threads which are being completely flooded with work from the bicubic rendering code.

So, I'm not sure this can be avoided unless you're heck-bent on hoisting this code into a separate process, which would then finally have a separate work queue that won't compete with the main processes's UI thread. We seem to be approaching a critical mass of need for this type of infrastructure.

rickbrew commented 3 years ago

I think we just have to deal with the fact that if you're running w/o hardware acceleration, performance just isn't going to be optimal.

rickbrew commented 3 years ago

I'm grasping at straws here, but maybe IDXGIDevice::SetGPUThreadPriority() would help? https://docs.microsoft.com/en-us/windows/win32/api/dxgi/nf-dxgi-idxgidevice-setgputhreadpriority . You'd set the priority ... lower? It's not clear whether lower values give higher priority, or the reverse. I also don't know if WARP pays attention to this.

ID3D11Device can be QI'd for a pointer to its IDXGIDevice implementation.

0xC0000054 commented 3 years ago

I think we just have to deal with the fact that if you're running w/o hardware acceleration, performance just isn't going to be optimal.

I agree. Thanks for the detailed explanation.

So, I'm not sure this can be avoided unless you're heck-bent on hoisting this code into a separate process, which would then finally have a separate work queue that won't compete with the main processes's UI thread. We seem to be approaching a critical mass of need for this type of infrastructure.

Running FileType plugins (and the shell open and save dialogs) in a separate process makes sense. It could also simplify cancellation support for FileType plugins, the OS would free any native resources when the process exits.

Closing this issue.