AcademySoftwareFoundation / OpenShadingLanguage

Advanced shading language for production GI renderers
BSD 3-Clause "New" or "Revised" License
2.07k stars 350 forks source link

Don't insert redundant run layer calls inside a basic block #1665

Closed chellmuth closed 1 year ago

chellmuth commented 1 year ago

Description

This patch is a very simple part one of work to optimize the number of conditional run layer calls that OSL makes, which take the form:

if (!groupdata->run[parentlayer])
    parent_layer([...]);

After examining our slowest JIT’ing shaders on gpu, @tgrant-nv showed us that these run layer calls were significantly bloating our ptx and slowing down their compilation. Many of them are unnecessary because static analysis can show we are guaranteed to have already run the layer in question.

For example, reading different fields from a connected struct like below will often generate multiple conditional run layer calls:

struct Pair {
    float x;
    float y;
};

shader example_shader(Pair p = {})
{
    if ([…]) {
        float xx = p.x;
        float yy = p.y;
        […]
    }
}

Since this is not really a performance issue on cpu (runtime overhead is just checking the conditional bit), OSL doesn’t make a huge effort to optimize away these redundant cases. Each useparam op will be careful not to insert more than one per layer, and code that is called unconditionally in the main section of the shader will also not have redundant checks, but there are plenty of examples like above that are not currently removed.


This patch takes a basic first step, adding metrics to track run layer call counts and removing duplicate run layer calls inside a basic block, covering cases like the example above.

On my set of production test scenes, this alone removes anywhere from 20-50% of total run layer calls, with our biggest shaders seeing the most improvement. Total ptx line reduction varies between 3-20%, and optix module compilation (jit time) is reduced between 1-15%.

Tests

I don't know how to write a test that asserts an intermediate result of an optimization (fewer run layer calls) instead of asserting that an image looks correct.

Internally, I replaced omitted run layer calls with asserts that groupdata->run[parentlayer] == true, and ran against several test scenes totaling around a hundred or so shaders.

Checklist:

tgrant-nv commented 1 year ago

Looks great. Is there any change to the output of the shader? Does this change the number of actual layer-to-layer calls when a shader is executed?

chellmuth commented 1 year ago

Is there any change to the output of the shader?

There shouldn't be any semantic difference. While testing, I changed all omitted layer checks to assertions that the layer had already been run, and didn't trigger any exceptions.

Does this change the number of actual layer-to-layer calls when a shader is executed?

It shouldn't, and would be a bug if it did, but I didn't log that and verify.

AlexMWells commented 1 year ago

Could you add the same changes in the batched version: batched_llvm_gen.cpp and batched_llvm_instance.cpp?

lgritz commented 1 year ago

Alex is right. I was thinking that this was all oso manipulation, and it slipped my mind that because this happens as we're generating LLVM IR from he oso IR, it will need a corresponding change for batched shading, so I merged without remembering to make that request. It should be straightforward a straightforward change for you, Chris, since this doesn't in any way need any deep knowledge of how the SIMD code works.