Closed lgritz closed 1 year ago
An important case that this particular pull request doesn't cover is when the requirements of running on the GPU need constant propagation to happen, but the want to keep something available to ReParameter edits would prevent it. In the simplest case, you can just avoid using ReParameter on eg. string args or something like that. However it becomes problematic in slightly more complicated cases.
For example, let's say there's an "int mode" parameter to the shader that can be set to 1 or 2. In mode 1, it does various things, including appending "foo" to a texture name. In mode 2 it does other things, including appending "bar" to a texture name. When constant propagation is allowed to do it's thing, it's very likely the mode selection will be propagated through and that string manipulation will be done at compile time, leaving a single constant string with no runtime manipulation.
However, if you keep "mode" available for editing, suddenly we need to do string manipulation in the shader. For the GPU this is a no-go, so you'll have to detect that is in use and fail the GPU compile, at which point your renderer can either not use the GPU path or just fail to render with that shader completely.
We have a pass implemented that can prevent that from happening, and in that case we just have to recompile when those parameters are edited. We're in the middle of catching our local dev up to the head of OSL development, so aren't quite in a position to upstream that yet, but wanted to point out the pitfall here.
@sfriedmapixar and I discussed these separately as well.
Yes, this is an interesting case that I admit I had not considered when I was preparing this patch. I was thinking of the direct manipulation of parameters and didn't really recognize the issue that a parameter marked "interactive" could prevent a shader from correctly building for the GPU because it used the kind of construct that is fine in shader source code but only will work on GPU if it can be resolved to be a constant by the time we get done with the runtime optimization.
I suggest that if this patch is acceptable in other ways, that we go ahead and merge so that we can start using it and gain more experience with any limitations, knowing that shaders might be constructed in ways that certain parameters being marked interactive could make the shader group fail to build for GPU. We'll just watch out for those, and come back to the problem later with a more robust and automatic fix -- and if we are lucky, Stephen will have the opportunity to upstream the approach he already has.
BREAKING CHANGE: to RendererServices ABI (including for CPU) and to the renderer-side setup when using OptiX.
This overhauls the implementation of how interactively-editable parameters work, where they live in memory, and get it all working on GPU/OptiX so that renderers can support interactive adjustment of those params without recompiling shaders.
The basic gist is as follows:
We continue work to finish making a clean separation between "interpolated" parameters and "interactive" (editable) parameters.
Interpolated params are collected and put into a separate memory area -- a separate per-group allocation on both the CPU and GPU (where applicable). It needs to remember the offset into this arena where each of the interpolated parameters resides. These allocations and eventual release are taken care of by the OSL shading system, they live in the ShaderGroup. When the group is set up, this block of memory is initialized with the correct initial values of the params and are ready to go.
The implementation of ReParameter writes to this special memory area also, that's how it works now (both CPU and GPU).
How does the OSL library know how to allocate, free, and copy to the device memory? It doesn't! Instead, we add new RendererServices methods
device_alloc()
,device_free()
, andcopy_to_device()
. It's up to the renderer to provide those, so that the OSL library doesn't itself need to know about the Cuda runtime. These are trivial, there's really only one implementation that makes sense, and you can copy it from the ones in testshade and testrender.Interactive parameters are NOT constant folded during runtime optimization.
The shader entry points themselves now take an extra parameter in the main call -- this will be the pointer to the beginning of the shader group's interactive parameter arena.
When JITing, references to interactive parameters know to retrieve them from their designated offset into the interactive parameter area.
This means that the renderer-side OptiX/Cuda code is responsible for adding this extra pointer parameter to the call to the shader entry points. You can see how this is done in the testshade and testrender Cuda code.
It's up to the renderer to figure out how to make the OptiX hit program aware of the interactive parameter pointer for that particular material, in order to pass it to the osl shader entry point. The way I did it in testshade and testrender is using a field in the struct that's given to each entry of the shader binding table and can be retrieved on the OptiX side via optixGetSbtDataPointer(). In testshade/testrender, a data pointer already existed which wasn't used. In a real renderer, you may need to add a field or come up with whatever other way you want to somehow get this pointer, which can be retrieved from the group via
shadingsys->getattribute(shadergroupptr, "device_interactive_params", TypeDesc::PTR, &myptr);
you can see how I do that in optixraytracer.cpp (testrender) and in optixgridrender.cpp (testshade).
A number of other things you will see that's worth calling out:
I added a device_ptr utility template that is just a wrapper around a device side pointer that makes it hard to accidentally dereference it on the host side.
Since I was changing RendererServices anyway, I also remove unused register_global, fetch_global, global_map which were unused. They were leftovers from the way we handled strings in OptiX 6.x.
Encapsulate cuda global symbol name mangling into BackendLLVM::global_unique_symname(). I did this early on, turns out it wasn't necessary, but I still like that encapsulation, so I'm keeping it.
I bumped the 3rd set of digits in the version to reflect that the changes in RendererServices break ABI. This is only in main, it obviously cannot be backported to a release branch.
All tests pass for scalar and batch and optix.
I added a new simple reparam test, and renamed the old reparem to reparam-array. Oddly, the reparam-array test doesn't work properly on optix (it had never been tried before), but it also failed in optix at main -- so it's not related to this patch! Putting that on my list of other oddities to investigate later. It may just be a quirk of testshade, I'm not really sure yet.
Added to BackendLLVM (and batched) a llvm_ptr_type(TypeSpec) method that returns the LLVM type of a pointer to the specified type.