gkjohnson / three-gpu-pathtracer

Path tracing renderer and utilities for three.js built on top of three-mesh-bvh.
https://gkjohnson.github.io/three-gpu-pathtracer/example/bundle/index.html
MIT License
1.38k stars 133 forks source link

Add denoiser #85

Open gkjohnson opened 2 years ago

gkjohnson commented 2 years ago

https://github.com/BrutPitt/glslSmartDeNoise

http://blog.gregzaal.com/2015/11/24/new-pixel-filter-type-blackman-harris/

DennisSmolek commented 2 years ago

The other day I was doing something trying to learn more about simulating clouds and I saw a promotion/release from Nvidia about their raytracing and it showed how they used denoise to reduce the sampling by at least half for even huge complex stuff.

And that form most workflows, it was like the last node in the pipeline for super smooth results.

I love your raytracer but I'm on a potato.. I can get to about 1000 passes but it still looks grainy... So I had shelved being able to use it personally any time soon.

I tried to see if there was a standard few sets of algorithms like there is for GENERATING noise, and it seems like theres some old school fundamentals but theres also a lot of crazy innovative things coming out all the time.. I tried limiting to things published within the last 3 years...

Of course found a big 0 for the WebGL space, but Nvidia and others are pushing each other.

This paper came up a few times in discussions: https://cs.dartmouth.edu/wjarosz/publications/mara17towards.pdf

But when my eyes hit the math on page two:

Screen Shot 2022-04-09 at 06 31 04

I figured it reach out first haha..

Right before posting I found: https://alain.xyz/blog/ray-tracing-denoising

Which I didn't want to dive into if you had a solution in mind or strong opinions or something...

After our tweets I'm going to go over that article, I pulled some others too.

Dont search "Denoise" on shader tree.. .It repeatedly crashed my mac 100%, not just the browser..

There are 20-30 there too, that hopefully are implementations I can try

gkjohnson commented 2 years ago

I love your raytracer but I'm on a potato.. I can get to about 1000 passes but it still looks grainy...

Ha some day I'd like this to work well even on a potato but it's still early. There's still a lot of work to do to improve the speed of convergence including multiple importance sampling, direct light sampling, and denoising. But I'd like help getting all of it going so I appreciate you taking a look into this a bit!

Which I didn't want to dive into if you had a solution in mind or strong opinions or something...

I don't have a particular solution in mind. I think ideally a simple solution would come first and not require things like roughness / normal / depth texture output to aid in the noise reduction but I'm not sure how feasible that would be. I'm unfamiliar with the techniques used for this. That alain blog post looks pretty good. No rush on any of this, of course! Just getting an idea for what's possible is an awesome start.

In terms of API I was imagining something like a DenoiseMaterial that would be run just before rendering to the canvas. Effectively replacing the use of MeshBasicMaterial with the final full screen quad pass here:

https://github.com/gkjohnson/three-gpu-pathtracer/blob/99fc07a0ed53d8db3f50f857a562dcebbfeee5d2/example/materialBall.js#L90-L92

const fsQuad = new FullScreenQuad( new DenoiseMaterial( {
  map: ptRenderer.target.texture,
  // denoise parameters
} ) );

// ...

fsQuad.render( renderer );
DennisSmolek commented 2 years ago

There are a few complex though well written out approaches/solutions as well such as:

"Caffeine" by Rye Terrell Demo

Where he uses [some kind] of multisample approach..

You may already be doing something similar, This is where I have to admit I hadn't dived through your full flow yet..

Heres an older version of the Nvidia AI approach too

thomasaull commented 2 years ago

Just to throw it in: Maybe it's possible to use the Intel Open Image Denoiser via wasm? https://www.openimagedenoise.org/

DennisSmolek commented 2 years ago

Just to throw it in: Maybe it's possible to use the Intel Open Image Denoiser via wasm? https://www.openimagedenoise.org/

I'm not sure if jumping into WASM first is a good idea, but if one exists why not!

I've been meaning to break out the denoiser from the LGL Raytracer Which is pretty legit and fast: https://lgltracer.com/cornellBox/index.html

Paul recently showed off a demo and actually made connectors for it in react Demo: https://codesandbox.io/s/lgl-raytracer-rnuve Library: https://github.com/pmndrs/react-three-lgl

Oddly too though the creator of the LGL has decided to nolonger update it on NPM: https://www.npmjs.com/package/lgl-tracer

But I think it wouldnt be too hard to pull the denoise out and try it out at least

gkjohnson commented 2 years ago

I agree I'd like to skip using WASM for now but there's probably a lot to learn from the open image denoise project. The source for it is here:

https://github.com/OpenImageDenoise/oidn

And it looks like the GLSL-PathTracer project uses it, as well.

I've been meaning to break out the denoiser from the LGL Raytracer

As far as I understand that project is closed source, right? Probably best to not try to take code from a project that doesn't have an open license. But perhaps there's insight into which techniques were used.

gkjohnson commented 2 years ago

Here's a few documents on SVGF which might be promising for this use case:

https://teamwisp.github.io/research/svfg.html

https://cg.ivd.kit.edu/publications/2017/svgf/svgf_preprint.pdf

DennisSmolek commented 2 years ago

The "Current" leading methods according to resources like that Denoise blog have you doing crazy stuff with machine learning etc..

There's lots we can do before that, my current focus is this paper: https://webpages.tuni.fi/foi/papers/Koskela-TOG-2019-Blockwise_Multi_Order_Feature_Regression_for_Real_Time_Path_Tracing_Reconstruction.pdf Which calls its method Blockwise Multi-Order Feature Regression (BMFR) And claims a 1.8X performance gain over the previous generation, notably SVGF

Which is a common method and the one used by LPL

SVGF paper: https://cg.ivd.kit.edu/publications/2017/svgf/svgf_preprint.pdf

Basically we mix lighting and raster by splitting things.

This was a WIP for the existing three raytracing-renderer

https://github.com/hoverinc/ray-tracing-renderer/issues/2

And actually had 1 or 2 branches in progress before stopping the project..

https://github.com/hoverinc/ray-tracing-renderer/tree/demodulate-albedo

Here's LPL's API using Temporal and Spatial options

DennisSmolek commented 2 years ago

Cuda implementation of SVGF with good notes

https://github.com/ZheyuanXie/CUDA-Path-Tracer-Denoising

fraguada commented 2 years ago

Hello! All I can add is my query in the open image denoiser repo back in January about a wasm compilation target: https://github.com/OpenImageDenoise/oidn/issues/129

While I work on a project where we compile a c code lib to wasm, I'm not sure I'd be able to handle the oidn source!

Great that there are other alternatives to try!

repalash commented 2 years ago

How about starting with a simple bilateral shader?

bhouston commented 2 years ago

I can offer a bounty of $500 USD a merged PR that implements something akin to the denoiser used by Blender, which is the Intel Open Image Denoiser.

richardassar commented 2 years ago

This is certainly doable and I do have deep learning experience having worked for a games industry middleware company, they applied deep learning to real-time speech animation synthesis from raw audio and there was often a lot of raw network surgery required there. I've also been actively researching and engineering in that space for many years, much more so in recent years, so I could definitely lend some help here.

The model is just a UNet, a fairly simple architecture that shouldn't present too many problems to port over to WebGL shaders. I think this issue should cover only that, leaving the support for generating our own training data as a separate issue.

An approach like this would work well, generating shader code parameterised by the layer configurations, possibly with some kernel fusion as things like ReLU can be fused into a previous convolution, for example.

OIDN does provide pretrained weights but these have likely been trained on a dataset that isn't representative of our output, also this misses the opportunity to improve the results by providing auxiliary input channels.

Sampling a distribution of random scenes with various object placements, camera poses, material parameters etc. shouldn't be too tricky, and those can be POSTed off to a local endpoint which saves them out to disk for use in the training script. Alternatively the shaders in this project could be pulled into a native path tracer that invokes the main shader function from a compute shader which is likely to fair better in terms of GPU occupancy and also isn't locked to VSync. This would allow us to scale to larger datasets and again is likely best split out into a separate issue.

If we want to provide extra input channels to the network we'll have to output them with multiple render targets, this could present an opportunity to experiment with the precise selection of what values to use in addition to things like first bounce, surface normals, etc. such as a per-pixel vector of material parameters and intermediate values from the whole path tracing computation that could be separately integrated/accumulated.

I propose that we break it down as follows:

Those are my thoughts, I'm happy to start working on Phase 1.

bhouston commented 2 years ago

Interesting. If we could start with pre-trained data, I think that is enough. I would so much prefer to avoid having to train up our own data here, as we are not unique in terms of needs. Given that Intel Open Source Denoiser is open source, couldn't we just use their weights?

benzsuankularb commented 2 years ago

RESTIR is another great option for this. https://research.nvidia.com/publication/2021-06_restir-gi-path-resampling-real-time-path-tracing

richardassar commented 2 years ago

Interesting. If we could start with pre-trained data, I think that is enough. I would so much prefer to avoid having to train up our own data here, as we are not unique in terms of needs. Given that Intel Open Source Denoiser is open source, couldn't we just use their weights?

That's what I proposed in Phase 1, using their pre-trained weights, but unless their training data is similar to ours it will produce unpredictable results. They have a bunch of pre-trained models available, for LDR and HDR input images, with various options for additional channels, for example rt_hdr_alb_nrm which takes presumably un-tonemapped "beauty" image, albedo, and normals, we can provide these easily.

As @benzsuankularb suggested something like RESTIR could be interesting, it would need to fit into the path tracing code. It would be interesting to compare the efficiency of neural and standard denoising approaches.

Without knowing the inference overhead it's hard to say but ODIN seems to work with a very small number of samples so it may be very competitive with RESTIR, especially since it is just a pass over the output.

The least complicated approaches are obviously things like https://github.com/BrutPitt/glslSmartDeNoise and the ShaderToy denoising examples linked here.

As you placed the bounty specifically for ODIN then I'll keep looking into it unless there's consensus that another approach is preferred.

fraguada commented 2 years ago

FWIW, I use the Intel OIDN implementation in Cycles for Blender and Rhino, and I am not perceiving that the pre-trained models that are shipped are lacking. That being said, I am not rendering out very complex scenes. My colleagues do render out more complex stuff and I would say the same. We've never thought to ourselves, "if only we had better pre-trained models?".

richardassar commented 2 years ago

That's good to know, but as the saying goes "garbage in, garbage out". When a model is presented with inputs not within the training distribution you'll get pathological results. Assuming we are doing everything correctly, using the same functions, approximations and so on, then it should be fine.

richardassar commented 2 years ago

When we talk about input distribution this can also impact unexpected things like the camera projections, since ODIN is likely not trained on equirectangular projections this would be an example where the inputs lie on a manifold far from that which was sampled to produce training data.

Since the network is learning a set of filter banks in the downsampling and upsampling convolution layers these end up specialized to the data. For example, in raw audio networks you see gammatone filterbanks emerge for speech and very different filterbanks emerge when trained on non-speech. This is my concern, and things may still work fine, but as you push things you'll start to see unacceptable artifacts.

On the topic of neural denoising, there are things like Diffusion Models and Inverse Autoregressive Flows (used in parallel Wavenet) that are also worth trying. One justification for putting infrastructure in place for importing models of various architectures and a synthetic data collection pipeline is that we can conduct research and exploration, important if we choose to send smaller models to mobile devices.

richardassar commented 2 years ago

This trick might speed things up nicely, as it takes convolutions from time complexity O(n^2) to O(n*log(n)).

bhouston commented 2 years ago

Very interesting. If you want to add https://github.com/BrutPitt/glslSmartDeNoise @richardassar, which seems incredibly straight forward and with a high ROI, I can offer a bounty of $250 USD for that. It seems relatively straightforward and a good first step. :)

richardassar commented 2 years ago

Alright, that sounds good, plus it will add the base functionality for these fullscreen denoising passes as @gkjohnson laid out here.

I'll do glslSmartDeNoise first and then start on getting ODIN integrated, ODIN in should only take a couple of days.

So, just to confirm, glslSmartDeNoise for a $250 bounty and ODIN (Phase 1) for $500.

richardassar commented 2 years ago

Investigating the feasibility of @benzsuankularb's proposal of implementing ReSTIR.

See https://github.com/lukedan/ReSTIR-Vulkan/blob/master/src/shaders/spatialReuse.comp

The lack of SSBOs (outside of WebGL2-compute) would mean replacing reservoirs with textures / a texture array and resultReservoirs with enough framebuffers to fit all of the Reservoir struct defined in https://github.com/lukedan/ReSTIR-Vulkan/blob/master/src/shaders/include/structs/restirStructs.glsl#L31

The number of framebuffers required is RESERVOIR_SIZE * 4 + 1 with UNBIASED_MIS enabled and RESERVOIR_SIZE * 3 + 1 without, which isn't terrible. unbiasedReuse.glsl also makes use of SSBOs but could be implemented with the same method.

This leaves the remaining challenge of figuring out how to integrate this with the existing three-gpu-pathtracer code.

richardassar commented 2 years ago

Submitted a PR for this, https://github.com/gkjohnson/three-gpu-pathtracer/pull/194.

The results seem reasonable but it requires a bit of parameter tuning to balance the tradeoff between denoising and excessive blur.

Screenshot 2022-06-25 at 06-44-54 Material Orb Path Tracing

Give it a try.

richardassar commented 2 years ago

I saw improved results with the following settings, about double the sigma value and about one third of the threshold.

sigma = 5.32
threshold = 0.03
kSigma = 1.18

(updated the PR with these)

Screenshot 2022-06-25 at 07-32-55 Material Orb Path Tracing

I also think that radius in the denoiser could be scaled by resolutionScale, either directly or by modifying the effective sigma and kSigma by sqrt( resolutionScale ), just to normalize things a bit.

DennisSmolek commented 2 years ago

SSBOs but could be implemented with the same method.

I haven’t dived into your work (great job btw!!)

do you foresee a downside to using the “ reservoir” method over SSBO?

from my casual glance it looks like a dataTexture type setup, which even with the availability of attributes/buffers I’ve seen many modern/great libraries use,

Although it does feel like a hack..

richardassar commented 2 years ago

It's either a case of splitting things out over multiple render targets or one large one using a stride to split things into pages. I still need to take a look at what would be involved to integrate it into the rest of the path tracer. Next I'll try the ODIN (neural) denoiser and if people are interested then attempt ReSTIR at some point provided it looks feasible.

bhouston commented 2 years ago

I love this! We only need something to start. Let's get this first one merged and yes it is finicky, but at least we have something. We can then iterate and do another one. But we do not want to let the quest for perfection prevent incremental forward motion. :). I also think getting this first technique merged lets other try their hand at improving it as well.

gkjohnson commented 2 years ago

Some more new denoiser updates and implementations:

DennisSmolek commented 5 months ago

@gkjohnson What would be the best format to output from the denoiser?

I'm working on an OIDN to Tensorflow.js setup right now and trying to figure out the best way to take in/out the image data.

Tensorflow will auto convert all sorts of HTML image types but the data off the pathfinder is probably already a plain buffer.

I'm wondering too if I should return a NEW buffer or edit the buffer in place..

thomasaull commented 5 months ago

@DennisSmolek Just want provide the pointer to https://x.com/pissang1 who has implemented ODIN in (I think) threejs aswell and was mentioning somewhere he might open source it.

DennisSmolek commented 5 months ago

@DennisSmolek Just want provide the pointer to https://x.com/pissang1 who has implemented ODIN in (I think) threejs aswell and was mentioning somewhere he might open source it.

Thanks! We actually talked about it.

He confirmed some of my thoughts on execution but he doesn't seem up to release just yet.

I also discussed it with the OIDN Rust wrapper dev to use Tensorflow.js vs the CPU wrapper.

With the WebGPU backend being so fast and the ability to keep the buffer on the GPU I'm inclined to agree.

thomasaull commented 5 months ago

@DennisSmolek I see, way ahead of me :) Would definitely be a game changing feature for three-gpu-pathtracer :)

gkjohnson commented 5 months ago

@DennisSmolek Really excited to hear you're looking into this. I was hoping someone might be interested in it since I saw Max Liani's work.

I'm less familiar with Tensorflow and how it works so feel free to correct any of my assumptions.

I'm working on an OIDN to Tensorflow.js setup right now and trying to figure out the best way to take in/out the image data.

I'm not sure what other restrictions there might be but I see that Tensorflow supports both a WebGL and WebGPU backend. Is the TF interface backend agnostic? Ie can you easily run the same model with either backend without too many code changes? Without knowing too much in depth I'm imagining passing the pathtraced WebGL frame buffer to a function and model outputs the denoised result into a different WebGL framebuffer.

If it requires the WebGPU backend, though, then we'll have to read the WebGL framebuffer to the CPU and upload it to the WebGPU context and display it afterward. Some day this project will move to WebGPU, too.

I'm wondering too if I should return a NEW buffer or edit the buffer in place.

Lets allow for writing into a separate buffer since I think a common case would be progressively updating the "noisy" path traced buffer every frame and then denoising into a separate buffer for display. But I think whatever works for getting a first implementation going is best and then we can adjust afterward.

Let me know if there's anything else I can do to help with this! Excited to see any updates, as well

DennisSmolek commented 5 months ago

I'm not sure what other restrictions there might be but I see that Tensorflow supports both a WebGL and WebGPU backend. Is the TF interface backend agnostic? Ie can you easily run the same model with either backend without too many code changes? Without knowing too much in depth I'm imagining passing the pathtraced WebGL frame buffer to a function and model outputs the denoised result into a different WebGL framebuffer.

If it requires the WebGPU backend, though, then we'll have to read the WebGL framebuffer to the CPU and upload it to the WebGPU context and display it afterward. Some day this project will move to WebGPU, too.

In general, tensorflow is very backend agnostic allowing you to fallback based on system support or usage. For the javascript side of things the code for all backends is identical, so no issues there switching around. Handling the output is where it matters.

In this section of the docs they actually show how to use both the WebGL and WebGPU backends and pass the buffers to however you want to process it.

There are examples as well

The webGPU Backend is supposed to be even faster than the WebGL one, however as the pathtracer is using the WebGL backend it may be worth it to lock into the WebGL backend if it can keep the resulting buffer on the GPU. ( I'm not sure thats possible, I'm a little weak on compute with WebGL2.)

The other thing is it shows how to keep the data OUTPUT on the GPU but not the data INPUT. So I think regardless of backends we will have to take the framebuffer and input it into the denoiser on the CPU side.

Then the other optimization functions like upscaling can run (I noticed the "render size" option so I'm assuming thats what is happening)

I think with the current WebGL backend the ideal flow would be:

  1. Render the noisy framebuffer (smallest resolution setting) return to the CPU
  2. Send the framebuffer data to the denoiser using the WebGL backend
  3. The denoiser runs and passes back the GPUData reference
  4. The next steps in the render chain take place (upscale, tonemap, whatever)

I'm not sure what the limits are exactly with WebGL, however I know with WebGPU the buffer can persist on the GPU between draw calls and compute executions, so in the future there will be even less CPU round trips. I've asked already about the possibility of executing the tensorflow initialization on the compute side too but its just a feature request at the moment.


For now, I have a basic Autoencoder denoiser running to help me setup the tensorflow core. I'm working on loading/converting the OIDN weights and setting up their much more complex UNet model now. I hope to be done by the end of the week with a basic version. I'll probably use the pathtracer as my real-world test! I'll be sure to update with my results.

gkjohnson commented 5 months ago
  • Render the noisy framebuffer (smallest resolution setting) return to the CPU
  • Send the framebuffer data to the denoiser using the WebGL backend
  • The denoiser runs and passes back the GPUData reference
  • The next steps in the render chain take place (upscale, tonemap, whatever)

Thanks for the detailed response! I can't dig in too deeply right now so I'll have to rely on your expertise on this but this seems good to me. Is there a specific "upscale" step you're referring to from OIDN, though? Or is this just an example?

WebGL backend it may be worth it to lock into the WebGL backend if it can keep the resulting buffer on the GPU.

If it's not significantly different then this seems best to me - then we can just draw the resulting framebuffer to the canvas.

DennisSmolek commented 5 months ago

Is there a specific "upscale" step you're referring to from OIDN, though? Or is this just an example?

I saw some of the pathtracer demos have a "render" parameter that is usually ~0.5. It doesn't change the scale of the canvas but ups the load if you set it to 1. So I assumed this was some sort of upscaling/supersampling you were already doing.

If you're doing some upsampling you'd want the denoise pass before that, which would mean a CPU round trip.

gkjohnson commented 4 months ago

I saw some of the pathtracer demos have a "render" parameter that is usually ~0.5. It doesn't change the scale of the canvas but ups the load if you set it to 1.

Oh right - yeah this is just a basic linear interpolation upscale when drawing to the canvas. There's nothing fancy happening there.

DennisSmolek commented 4 months ago

So something small to add, Reading the TensorFlow Docs I found the section on creating the tensor from GPU data. according to this, we could indeed maintain the data completely on the GPU as inputs.

We'd still need to return to the CPU to initialize the call, but the data itself can stay on the GPU as a gl.texture. In WebGPU it's much nicer and cleaner as it would be a plain buffer.

The only problem with this method at the moment is to get the best results for OIDN you would pass the standard color input (the noisy image) and then also pass a Albedo and normals image.

Right now I handle this using tensorflow to concat the data as a part of the model execution. I'm not sure if this operation is CPU bound or not...

If it were a single input buffer (just the color pass) I don't have to concat it, but it's said the extra data greatly improves the quality. Something I'll have to test.

Status wise I'm going slow as its a huge pain to convert both the python and C++ logic to Typescript. I have the models loading, and have the weights, but I havent gotten a running test yet.

Max Liani's work is actually using a 1.X branch and is out of date, plus he never actually says how he got the model working (if he even did?)

I'm sticking with plain image data for the moment, once I have that working I'll cleanup the library and then start messing with the GPU buffers.

gkjohnson commented 4 months ago

If it were a single input buffer (just the color pass) I don't have to concat it, but it's said the extra data greatly improves the quality. Something I'll have to test.

I'd be curious as to what Max Liani and Yi Shen used for their output. It's a very different approach but the SVGF denoiser required this but seemed like it wouldn't behave well with transparent objects. OIDN is probably better in this case. If needed we can look into writing these buffers out, as well, using MRT. But as you mention I say lets start with just the noisy path traced buffer to start.

gkjohnson commented 4 months ago

Wanted to add this here so I don't forget it - but it sounds like using white noise during path tracing will provide a nicer result with OIDN than the current blue-noise stratified samples used currently:

https://x.com/pissang1/status/1803019554177843354

image
DennisSmolek commented 4 months ago

Wanted to add this here so I don't forget it - but it sounds like using white noise during path tracing will provide a nicer result with OIDN than the current blue-noise stratified samples used currently:

https://x.com/pissang1/status/1803019554177843354

Is that something easy to change/adjust?

Those results look great! We need to make sure people understand that for that level of clarity you likely need the Albedo and Normals. The docs push this as the best results, and Yi Shen confirmed that all his use those layers as well. Slight MRT work shouldn't be too bad.

I haven't tested it yet but my setup is already designed to take those layers automatically.

Oh and in further news, I got OIDN working! https://twitter.com/DennisSmolek/status/1807814362851401828

Screenshot 2024-07-02 020325

I haven't shared this on twitter yet but talking with the tensorflow folks I learned that the first pass through the WebGL backend should be discarded as it takes that pass to setup the shaders/backend.

For a 720 X 1280 image I went from 2,000ms - 15,000ms to predict and display to 4ms-7ms with WebGL.

The reason WebGPU was initially faster was it doesn't have as long of a shader/setup phase.

They actually said WebGPU so far is usually the same speed or slower in many cases and only a few cases faster. Definitely something to test.

Right now the library is taking in any ImageData, Array, or HTMLImageObject as input and can output in the same ways. Today I'm working on the buffer stuff which I don't anticipate a lot of problems with.

I'm going to use the lego example as I actually have a few of the sets so it'll be interesting.

As soon as I have the buffer working I'll publish the repo for feedback.

gkjohnson commented 4 months ago

Is that something easy to change/adjust?

Yeah it's fairly easy to change. Changing the PathTracingMaterial "RANDOM_TYPE" value to 0 it uses "PCG" randomness which is a white noise.

you likely need the Albedo and Normals ... Slight MRT work shouldn't be too bad.

Yeah it looks like OIDN handles transparency well. How normals should be handled is described in more detail in the OIDN docs. Sounds like the normal values are just blended like anything else.

I got OIDN working! ... For a 720 X 1280 image I went from 2,000ms - 15,000ms to predict and display to 4ms-7ms with WebGL.

Amazing work! And incredibly fast progress. Regarding the initial 2000-15000ms compilation time - I suspect you're on windows since the path tracer shader takes a long time to compile, as well. I'm wondering if it's possible to use asnyc compilation to help alleviate the pain of the start up time as I added for the pathtracer material in #650.

DennisSmolek commented 4 months ago

So something to note...

I have my denoiser library able to take in an external webGL context to use for the tensorflow.js webgl backend, so the texture output can be shared with other applications (like threeJS)

Problem is, Even connecting to a running threeJS context seems to have conflicts between the two applications.. I think because of location binding. This part is getting into the GPU weeds a bit for me, but I think for at least 1.0 of the denoiser texture syncing isn't going to work until the conflict can be resolved.

image

I spent a day setting up gl in/out and I'm not sure how to fix this.

Also, ThreeJS REALLY doesn't want you to pass in your own WebGLTexture to texture objects, but I wasn't able to finalize testing on this as the two systems dont like sharing the same context at all.

This also means direct GPU Input isn't going to work either, as that also requires GPU context sharing.

I'm more hopeful for WebGPU in this regard, but as the pathtracer is webGL, it doesn't help much.

For now, I'm going to back burner GPU Input/Output and just do a CPU Sync.

Pathtracer renders to a framebuffer, pass to the denoiser, denoiser outputs an imageData buffer that I push on a texture and back onto the threeJS backend.

some reading says I might be able to speed this up with an Offscreen Canvas...

repalash commented 4 months ago

Sorry to jump in the middle here, but isn't it possible to render to a canvas and use in threejs as a CanvasTexture

DennisSmolek commented 4 months ago

Sorry to jump in the middle here, but isn't it possible to render to a canvas and use in threejs as a CanvasTexture

Yeah that's totally possible, and might be what I do if I use the offlineCanvas

The thing is to render onto the Canvas it has to take the data off the GPU and convert it on the CPU to a Buffer. Then we draw it onto the canvas. Then threeJS Takes that canvas, makes it into ANOTHER buffer, and sends it back to the GPU in a material. Which then of course comes back off the GPU to the final canvas render.

Tensorflow outputs as a texture, meaning, (if there were no conflicts) the data never leaves the GPU and skipping all the CPU buffers and straight to the last step of rendering.

It's the same with the input. Tensorflow accepts a framebuffer/texture as Input, so you could draw to the framebuffer and that data would stay on the GPU, get processed, and never leave. This saves a lot of speed/bandwidth.

It's not a dealbreaker or the end of the world, but it sucks knowing that capability is there just not usable.

gkjohnson commented 4 months ago

cc @mrdoob

tldr; Dennis is working on integrating an ML denoiser using tensor flow and we'd like to use the same WebGL context from three.js' WebGLRenderer to avoid copying data between contexts. However three.js seems to break when sharing the context due to state inconsistency. It also doesn't seem possible to get or provide a frame buffer handle to / from three.js so it can be shared with tensor flow. Are there any recommendations here? (see https://github.com/gkjohnson/three-gpu-pathtracer/issues/85#issuecomment-2205467554)

@DennisSmolek correct me if I've misunderstood anything. Calling WebGLRenderer.resetState before and after doing anything with TensorFlow may help but if TF requires that no state changes happen between multiple frames (ie running the model and reading) then it may still be an issue.

For now, I'm going to back burner GPU Input/Output and just do a CPU Sync.

Seems like a good plan for now. I'm not as familiar with WebGPURenderer, yet, so it's not clear to me whether these things will still be an issue with WebGPU, unfortunately.

mrdoob commented 4 months ago

@DennisSmolek correct me if I've misunderstood anything. Calling WebGLRenderer.resetState before and after doing anything with TensorFlow may help but if TF requires that no state changes happen between multiple frames (ie running the model and reading) then it may still be an issue.

I was going to say exactly this 👀

Twinklebear commented 4 months ago

I saw that Yi Shen just open sourced their web port of OIDN which might provide some inspiration (or maybe can just integrate that) @DennisSmolek : https://x.com/pissang1/status/1809119875258011671

DennisSmolek commented 4 months ago

I saw that Yi Shen just open sourced their web port of OIDN which might provide some inspiration (or maybe can just integrate that) @DennisSmolek : https://x.com/pissang1/status/1809119875258011671

Yeah... A bit confused by that as I asked about it and didn't think he was going to open source it. Glad it's out there though! Just wish I knew that was his plan a few weeks ago when we first chatted.

Unfortunately his library doesn't support webGL GPU handling which is the current blocker.

Because we both pull from OIDN our UNets are the same but much of the rest of our code is different.

I wanted to wait until I had webgl working to release but I don't want people to loose interest so I'll be releasing my current state with my webgl testing app so hopefully other people can help.

I'll work on docs and an example and get it up tomorrow or Sunday