AcademySoftwareFoundation / OpenShadingLanguage

Advanced shading language for production GI renderers
BSD 3-Clause "New" or "Revised" License
2.09k stars 357 forks source link

Draft: Remove more unnecessary conditional run layer calls #1713

Open chellmuth opened 1 year ago

chellmuth commented 1 year ago

This PR is a work-in-progress that I've hit a (hopefully) minor stumbling block on.

Overview

@tgrant-nv was looking at some of our shader ptx awhile ago and pointed out that a surprisingly high amount of the total instructions were just conditional layer call checks (due to all the stores and loads surrounding calls). The concern was that this was slowing down OptiX module compilation.

OSL has some basic checks in place to remove unnecessary conditional layer calls, but this patch takes things further. The high level strategy is:

It doesn't help the compilation times for every shader, but we've seen speedups as high as 5x on some of our most expensive shaders.

Work in progress

The problem I've run into is that the optimization pass introduces quite a bit of overhead to the ptx generation. Most assets take 10-20% longer to generate ptx with this patch. For us, it's probably still a good trade to turn on for GPU (module compilation is more expensive than ptx generation), but I was hoping some more eyes could help improve things. Especially since I've never written a LLVM pass before, maybe I'm doing things particularly inefficiently.

Any feedback or ideas would be appreciated!

Tests

Checklist:

linux-foundation-easycla[bot] commented 1 year ago

CLA Missing ID CLA Not Signed

tgrant-nv commented 1 year ago

This sounds very promising. A 10-20% increase in JIT time isn't so bad, if it increases the shader performance or even reduces the PTX compile time. We'll take a closer look and see what recommendations we can make.

chellmuth commented 1 year ago

This sounds very promising. A 10-20% increase in JIT time isn't so bad, if it increases the shader performance or even reduces the PTX compile time. We'll take a closer look and see what recommendations we can make.

Thanks! Looking forward to your findings.

lgritz commented 1 year ago

Is this ready to be a non-draft? Looks like it might need a rebase as well.

chellmuth commented 1 year ago

Is this ready to be a non-draft?

It's close. It turned out the speed hit we've been seeing is because removing so many run-layer checks made the remaining ones more palatable for inlining. So LLVM's inlining pass times are growing quite a bit.

Now that it's been merged, I want to test this patch internally on #1680, since that also has inlining ramifications and I want to make sure everything interacts okay.

AlexMWells commented 1 year ago

@chellmuth , as an experiment could you add a no-inline attribute to the layer functions to prevent them from being inlined? Not sure you really want to do that, but could be an interesting data point.