Corrupt framebuffer? - Githubissues

paulmelis commented 4 years ago

I'm hitting an issue where the framebuffer as provided by ospray apprently get corrupted, see this image:

0009

It looks like part of the tiles used during rendering are not updated and/or written to the framebuffer. I haven't been able to reproduce it with a simple test case, but it happens in my render server when renders are canceled and restarted in a fairly high frequency, something like a couple of times per second, due to user viewpoint interaction.

On a high level the server works like this:

The client send framebuffer resolution, number of samples per pixel, scene content and a camera, then starts the rendering
If no interaction happens the requested number of samples per pixel is computed for the current view using individual calls to ospRenderFrame (each with spp = 1), with the same framebuffer read back after each sample and sent over a socket.
When the viewpoint is changed the current render is canceled with ospCancel() and the future released. The camera is updated and rendering is restarted with a new call to ospRenderFrame.

The initial sequence of frames until the first moment of interaction is correct, I can see the variance go down. But when I do some interaction and the render server receives new camera information things go pear-shaped.

Some more information:

I'm allocating the framebuffer once and reuse it in calls to ospRenderFrame.
The server is single-threaded, I use the new async rendering support in 2.0
Another symptom when this happens is that the variance returned by ospGetVariance is always inf
The framebuffer can have "exotic" resolutions as it is based on the user-chosen view size. As such, it might not have an even number of pixels in width or height. The framebuffer is 32-bit RGBA float

I will have more time to dive into this in a few days, but wanted to ask if the above rings any bells as to why this might go wrong. Are there implicit limits to the use of cancel?

jeffamstutz commented 4 years ago

For local rendering ospCancel() will just stop rendering where it was, which could be a half-rendered new image. Furthermore, if you commit any object participating in a running render before the OSPFuture has completed, the result is undefined.

What we do in the tutorial examples is batch up all object handles that need to be committed, then commit them all once in between the currently finished frame and starting the next one. You can see that happen here (block of code between frames) and here (the loop that commits the handles).

jeffamstutz commented 4 years ago

FYI, objectsToCommit in the tutorial code could just be a plain std::vector<OSPObject>: the fact it is a mutex-protected vector (i.e. an ospcommon::containers::TransactionalBuffer<OSPObject>) is not an important detail here (and is probably left over from previous versions of that widget class).

paulmelis commented 4 years ago

Hmm, thanks for the extra info, but looking at this a bit more closely I'm never reading the framebuffers of one of the canceled renders it seems (as should be), only those that finished their ospRenderFrame succesfully.

I also have another indication that this is timing related: if I let the initial view render 32 samples all is well, if I then switch to different view in one discrete step the next 32 samples also look fine, but if I switch fast between views the corrupted framebuffer appears (and doesn't go away anymore). And I don't see how my code could be timing dependent other than reading the framebuffer before it is ready, as it is single-threaded. Strike that, no ospCancel is done in the case all 32 samples are rendered.

But I will need to look a bit more closely still to give a definite pointer of where things seem to go wrong, will let you know.

paulmelis commented 4 years ago

Strangely I'm not able to reproduce the same issue when using the scivis renderer (same scene, etc)

paulmelis commented 4 years ago

After your comments in #368 here's an updated example that (I think) reproduces the issue I'm seeing. But I'm starting to doubt my own abilities by now, with you keep pointing out my silly mistakes :)

code2.zip

This is firstFrame.ppm, which is saved after ospWaiting on the render to finish:

firstFrame

The camera is then updated (and committed), rendering gets restarted, but quickly ospCanceled and ospWaited on.

Rendering is then restarted a second time, ospWaited on and saved to secondFrame.ppm, which shows the artifacts (which only happen with the pathtracer):

secondFrame

paulmelis commented 4 years ago

@jeffamstutz One more observation: when using --osp:debug the problem goes away, but the final frame saved then does not use the updated camera position.

jeffamstutz commented 4 years ago

This is fixed now on release-2.0.x with 2c9c8c3c2 and demonstrated in ospExamples with 6325e2c3f.

paulmelis commented 4 years ago

When the cancel frame on interaction option is enabled with ospExamples should it show an intact framebuffer when interacting? Or is it to be expected that the framebuffer as rendered up to that point is shown (and thus "corrupt")?

jeffamstutz commented 4 years ago

Or is it to be expected that the framebuffer as rendered up to that point is shown (and thus "corrupt")?

Yes, it is expected to look blocky: when a frame is canceled, all in-progress tiles are completed and the rest get skipped. The bug that had to be fixed is that the pathtracer wasn't resetting the "cancel" state, which means it always thought the frame was cancelled.

The logic in ospExamples always maps the frame buffer if the frame is finished, even if it was cancelled...which was decided on pure implementation simplicity. "Real" applications, such as OSPRay Studio or BLOSPRAY, would instead re-render a full (cheap?) frame. I'm trying to keep GLFWOSPRayWindow as minimal as possible, but I may bring that back next time I'm working on ospExamples as it's straightforward to implement.

RenderKit / ospray

Corrupt framebuffer? #367