darksylinc commented 3 years ago

Current behavior

Currently Gazebo is doing, for every camera (of any type):

this->dataPtr->ogreCompositorWorkspace->setEnabled(true);
auto engine = Ogre2RenderEngine::Instance();
engine->OgreRoot()->renderOneFrame();
this->dataPtr->ogreCompositorWorkspace->setEnabled(false);

This is inefficient. Because:

Ogre::SceneManager::updateSceneGraph can be called once before all cameras (unless objects are moved or added/deleted in between)
We force a GPU flush per camera, instead of flushing after all cameras (some exceptions apply)

Desired behavior

The code can be replaced with the following:

this->dataPtr->ogreCompositorWorkspace->_validateFinalTarget();
this->dataPtr->ogreCompositorWorkspace->_beginUpdate(false);
this->dataPtr->ogreCompositorWorkspace->_update();
this->dataPtr->ogreCompositorWorkspace->_endUpdate(false);

Ogre::vector<Ogre::RenderTarget*>::type swappedTargets;
swappedTargets.reserve( 2u );
this->dataPtr->ogreCompositorWorkspace->_swapFinalTarget( swappedTargets );

While Ogre::SceneManager::updateSceneGraph gets moved to Scene::PreRender and a GPU flush is moved to Scene::PostRenderGpuFlush (PostRenderGpuFlush currently doesn't exist)

Exceptions

Scene::PostRenderGpuFlush Flushes all buffered frame data and starts 'a new frame'.

CPUs queue up a bunch of rendering commands and then waits for the GPU to finish up. If we flush too often, the CPU will often have to wait for the GPU doing nothing. If we flush too infrequently, RAM consumption will rise due to queueing up unsubmitted work.

This function should be called after updating all cameras / once per frame, but it may be called more often if you're updating too many cameras and running out of memory.

Example:

Cubemap rendering w/ 3 probes and 5 shadowmaps can cause
a blow up of passes:

(5 shadow maps per face + 1 regular render) x 6 faces x 3 probes =
108 render_scene passes.

108 is way too much, causing out of memory situations; so flushing once per probe may make better sense in this case.

Note however that Gazebo is currently flushing once per cubemap face (i.e. flushing 6x3 = 18 times). While with my proposal (once per cubemap) we would flush 3 times.

Reason

This is merely a performance improvement.

This paves the way for an additional improvement. For example currently Depth Sensors are doing the following:

for every camera
  camera->Render();
  camera->PostRender(); // Downloads depth data

This negates our optimization proposed in this ticket because PostRender will force the CPU to wait for the GPU. The render loop can be improved to the following:

while( cameras.empty() )
  // Queue up some commands to GPU (N is arbitrary and user controlled)
  for N cameras
    camera[i]->Render();
    cameras.pop();
  // Dispatch the commands to GPU
  scene->PostRenderGpuFlush();

// Collect the depth data to CPU. By now some data should already be on
// CPU memory so the first PostRender calls will be quicker than the last
// ones
for every camera
  camera->PostRender();

This allows the CPU to do more work while the GPU is rendering the previous cameras in parallel

N is an arbitrary number, where it depends on available memory and scene type (large values of N queue up too much, low values cause too much stall).

Impact

This is an ABI and API breaking change.

Other components need to change so they also call PostRenderGpuFlush (usually looking for calls to Scene::PreRender is a good start).

e.g. ign-sensors needs to add a call to PostRenderGpuFlush

Users must call PostRenderGpuFlush at some point or else errors or out of memory conditions may occur.

Because of this, a boolean will be added for migrating users. This boolean will control whether the old behavior or the new behavior is used.

In the old behavior, Gazebo will issue one updateSceneGraph and one PostRenderGpuFlush per Camera (default) which is what Gazebo is currently doing. In the new behavior, Gazebo assumes the user properly calls PreRender and PostRenderGpuFlush when appropriate Samples need to be updated to reflect this change

Risks

As with the Sensors example, the road to optimum performance is one where no stalls occur. Ideally CPU and GPU are always busy. However a single sync operation that forces a stall (usually GPU -> CPU communication) can heavily negate the benefits of this ticket. This needs to be constantly monitored and profiled. Nonetheless Gazebo's behavior can be improved.
Failing to call updateSceneGraph after scene has changed between cameras will result in incorrect rendering (e.g. missing lights, objects in the wrong location) or a crash (e.g. if lights are deleted). Ogre has asserts in place to detect this problem; which is why I also proposed Gazebo includes a 'validation' where Ogre is run using a Debug build. Note that currently Gazebo is sometimes failing these asserts (that's for another ticket) but don't seem high risk (it seems to be related to ray queries used by mouse picking). Please also note if a call to updateSceneGraph is required, then it is also extremely likely that a call to Scene::PreRender is also needed (to perform internal Gazebo maintenance). Which would mean Ogre's assert would just be detecting incorrect gazebo usage

Implementation suggestion

I'm the one implementing this feature. This ticket is for tracking progress.

Progress

[x] Replace most renderOneFrame calls for per workspace calls
[x] Add PostRenderGpuFlush
[x] Add boolean setting to toggle behavior
[x] Document that Camera::Update is a utility function that shouldn't be called if you have many cameras
[x] Replace remaining renderOneFrame calls for per workspace calls
[x] Test in every Render() call that we're inside PreRender / PostRender
[x] Document SetCameraPassCountPerGpuFlush is an upper bound
[x] Rename functions for consistency
[ ] Modify samples to use new behavior
[x] Document scene should not be altered (e.g. creating/destroying lights) between Scene Pre/PostRender calls
[ ] Try to call PostRender as a fallback mechanism, but severely warn the user. Should we use caps locks? Warn every time (not just once)?

Progress can be tracked at:

Blocking problems:

With cameraPassCountPerGpuFlush = 6

[x] 69 - INTEGRATION_depth_camera (Failed)
[x] 71 - INTEGRATION_camera (SEGFAULT)
[x] 72 - check_INTEGRATION_camera (Failed)
[x] 73 - INTEGRATION_render_pass (Failed)

With cameraPassCountPerGpuFlush = 0

[x] 69 - INTEGRATION_depth_camera (Failed) - It's #332
[x] 73 - INTEGRATION_render_pass (Failed) - It's just #315

darksylinc commented 3 years ago

@iche033 Quick question. Camera::Update says:

/// \brief Renders a new frame.
/// This is a convenience function for single-camera scenes. It wraps the
/// pre-render, render, and post-render into a single
/// function. This should be used in applications with multiple cameras
/// or multiple consumers of a single camera's images.
public: virtual void Update() = 0;

I believe it should say "This shouldn't be used in applications with multiple cameras" ?

Because this function should be used for apps with a single camera, or multiple consumers of a single camera images. But it shouldn't be used in applications with multiple cameras.

darksylinc commented 3 years ago

Figured out a more user friendly solution to efficient loops of cameras i.e. the while( cameras.empty() ) thing:

All Cameras should call FlushSomething() (name pending) inside Camera::Render for every camera. This happens inside Gazebo so the user doesn't have to change its update loops. However the Flush function will only flush if it gets called every N times.

The user would still have to call Scene::PostRenderGpuFlush at the end of everything, but aside from that, the user can call: Scene::SetNumCameraPassesPerFlush( uint value ) where value must be >= 0

A value of 0 means we flush every camera (old behavior) and no need to call Scene::PostRenderGpuFlush at the end of everything. A value > 1 is new behavior, and user must call Scene::PostRenderGpuFlush at the end of everything.

With these changes, a user updating like this:

scene->PreRender();

for every camera
  camera->Render();

for every camera
  camera->PostRender();

scene->PostRenderGpuFlush();

should achieve optimum performance

iche033 commented 3 years ago

I believe it should say "This shouldn't be used in applications with multiple cameras" ?

yes :)

Figured out a more user friendly solution to efficient loops of cameras i.e. the while( cameras.empty() ) thing:

sounds good! I'm assuming the idea is to keep the value at 0 as default behavior and let advanced users call SetNumCameraPassesPerFlush with value > 0?

darksylinc commented 3 years ago

sounds good! I'm assuming the idea is to keep the value at 0 as default behavior and let advanced users call SetNumCameraPassesPerFlush with value > 0?

The idea is that default is 0 yes. Though also my idea is that most samples show this value not being set to 0 to show to how to use, and new users (rather than just advanced once) set this to non-0; since it's not really that advanced.

This value being 0 is to ease porting of existing users, where failing to call PostRenderGpuFlush results in leaks or other weird behavior (and the complexity of existing codebases is unknown but presumably large).

darksylinc commented 3 years ago

Update

With my recent changes in my branch, GpuRays/GpuRaysTest.RaysParticles test started failing so I took a deeper look to fix it.

Turns out this change will fix a subtle bug caused by Ogre2GpuRays; therefore this fix is no longer merely a "performance improvement" and now it is a bugfix too (only present when using ogre2):

Every cube face is rendered with renderOneFrame
As per Ogre, calling renderOneFrame means 'this is a new frame'
Particle FXs will therefore advance their simulation forward in time with each call
As a result, each face will render the particles in a different time state! This is hard to notice in simple scenes if the scene took very little time to render

Technically the problem is global e.g. rendering to a two regular cameras and then to all 6 faces of a Ogre2GpuRays means the particles can be in up to 8 different time-states instead of 1.

darksylinc commented 3 years ago

I have a question for you @iche033 about the default value of SetNumCameraPassesPerGpuFlush

I made a couple changes to the proposed ones, which is less intrusive than what was originally proposed.

The current changes are like the following:

Added Scene::SetNumCameraPassesPerGpuFlush. A value of 0 is the legacy (i.e. current) behavior. A value >= 1 controls memory vs performance tradeoffs
When SetNumCameraPassesPerGpuFlush > 0, Users must call the new Scene::PostRender function.
If Scene::PreRender is called twice in a row without calling Scene::PostRender, or PostRender gets called twice in a row without PreRender, Ignition will assert, which immediately notifies the user there's something wrong that needs their attention.

Given that we can warn the user what's wrong (i.e. they didn't update their code to call Scene::PostRender), it may be better to set a default value for SetNumCameraPassesPerGpuFlush somewhere between [1; 6] rather than defaulting to legacy value of 0

What do you think?

iche033 commented 3 years ago

The changes sound good to me. We should probably target these changes to main since it's a behavior / API usage change, and also update the Migration guide.

If Scene::PreRender is called twice in a row without calling Scene::PostRender, or PostRender gets called twice in a row without PreRender, Ignition will assert,

I am thinking of a way to ease transition for users. Instead of an assert, we could print out an error msg like you're already doing, and in addition, we either call PostRender for them (if PreRender is called twice) or ignore the duplicate PostRender call (if PostRender is called twice)? For the Camera::Update function, we could just call PostRender for them - I think you're already doing that in your branch.

Given that we can warn the user what's wrong (i.e. they didn't update their code to call Scene::PostRender), it may be better to set a default value for SetNumCameraPassesPerGpuFlush somewhere between [1; 6] rather than defaulting to legacy value of 0

yeah I think that's fine. How about a default value to 1? For users wanting to use more cameras, they'll then need to change this value. We'll likely need a small tutorial teaching users how to use these APIs later

Some other minor comments on APIs:

SetNumCameraPassesPerGpuFlush -> SetCameraPassCountPerGpuFlush. To be consistent with Ignition naming convention, we use Count in place of Num, e.g. here
GetLegacyAutoGpuFlush -> LegacyAutoGpuFlush. Ignition prefers leaving out the Get prefix in function name

darksylinc commented 3 years ago

I am thinking of a way to ease transition for users. Instead of an assert, we could print out an error msg like you're already doing, and in addition, we either call PostRender for them (if PreRender is called twice) or ignore the duplicate PostRender call (if PostRender is called twice)? For the Camera::Update function, we could just call PostRender for them - I think you're already doing that in your branch.

We can do that, but issuing some Ogre commands before PostRender (e.g. creating/destroying a light between PreRender/PostRender is a no-go*) might cause trouble, hence the message must put emphasis in the importance of doing it.

I should probably mention that in the doc.

yeah I think that's fine. How about a default value to 1? For users wanting to use more cameras, they'll then need to change this value. We'll likely need a small tutorial teaching users how to use these APIs later

I should clarify this better in the docs: If you set a value of e.g. 6, but you have a single camera and call PostRender, this will force a flush and hence having a value of 1 or 6 won't matter as the result will be exactly the same (in every term: performance, memory consumption).

A value of 6 is like an upper bound. We may queue up to 6 render passes or less; but never more.

But that does not mean a value of 6 unnecessarily wastes 6x more RAM than a value of 1 if you only have one regular camera.

Some other minor comments on APIs:

* `SetNumCameraPassesPerGpuFlush` -> `SetCameraPassCountPerGpuFlush`. To be consistent with Ignition naming convention, we use `Count` in place of `Num`, e.g. [here](https://github.com/ignitionrobotics/ign-rendering/blob/ign-rendering5/include/ignition/rendering/Scene.hh#L188)

* `GetLegacyAutoGpuFlush` -> `LegacyAutoGpuFlush`. Ignition prefers leaving out the `Get` prefix in function name

Got it.

TL;DR changes for me to implement

Document scene should not be altered (e.g. creating/destroying lights) between Scene Pre/PostRender calls
Document SetCameraPassCountPerGpuFlush is an upper bound
Try to call PostRender as a fallback mechanism, but severely warn the user. Should we use caps locks? Warn every time (not just once)?
Rename functions for consistency

iche033 commented 3 years ago

I should clarify this better in the docs: If you set a value of e.g. 6, but you have a single camera and call PostRender, this will force a flush and hence having a value of 1 or 6 won't matter as the result will be exactly the same (in every term: performance, memory consumption).

I see, thanks for the clarification, that makes sense.

darksylinc commented 3 years ago

OK!

So I did extensive testing(*) and it looks stable and working. Which surprised me.

It looks ready or almost ready to start merging.

But I do have a couple of questions on the merge:

Right now it's based off ign-rendering5. But it obviously goes should be merged into main branch. Do I create the PR to be merged into main? Or do I first merge my branch into main and submit the PR? This obviously means that all changes done so far to ign-rendering5 (including those unrelated with this feature) will be merged into main.
1. The same question goes for ign-gazebo and ign-sensors
The current default is SetCameraPassCountPerGpuFlush = 6 instead of 0 (legacy). Which is not what was proposed. The rationale is that:
1. The final method ended up being less invasive than I thought. All samples that I could test (more on this later) did not need modifications at all. They 'just worked' because they were using Camera::Update or Camera::Capture which handles all these details
2. When the rules are violated (e.g. Camera::Render was called outside PreRender/PostRender calls, PreRender was called twice in a row, etc) we inform the user via asserts and the fix is extremely easy: Either add the missing calls, move them around, or SetCameraPassCountPerGpuFlush to 0. What goes wrong should be easy to identify and fix.
3. I tried looking into 'recovering', i.e. instead of asserting to do the cleanup ourselves; but it just won't work. Doing this can either work fine.... or mask serious bugs, from subtle ones like performance problems or hard-to-diagnose crashes (because it'd crash deep inside Ogre and the user will be clueless on what went wrong). So it's better to just stop the whole engine via assert and report what went wrong and how it needs to be fixed.
I tested all samples except these ones:
1. custom_shaders_uniforms (crashes in Ogre2, seems to be known issue)
2. gazebo_scene_viewer (needs 'gazebo'). There's a higher chance this one needs updating. I'd need help on how to set it up (because it couldn't find 'gazebo'. It looks like it depends on the old gazebo engine instead of ign-gazebo?)
3. heightmap (known issue: not implemented)
4. text_geom (known issue: not implemented)

Thus probably gazebo_scene_viewer demo is the only one I should be able to get working to see if it works, and then the code should be merged to main (repos to be merged are ign-rendering, -gazebo, -sensors)

(*)which of course means another user will encounter an obvious-unmissable problem the second they try it out

iche033 commented 3 years ago

yes you can target main branch and then we don't need to worry about ABI changes. Let me merge ign-rendering5 forward to main first. Then you should be able to just merge your branch with main and create the PR. That way, there won't be able unrelated changes in your PR. For other repos, I can also do the same but it can take some time depending on the changes that need to be merged forward. Alternatively, you can rebase your commits on top of main and the create the PR so you don't have to wait for the merge forward.
ok lets give SetCameraPassCountPerGpuFlush = 6 a try. i. yes sounds about right. That makes me think that once the changes are merged in, we should probably have a small sample demo showing this new API usage. ii. and iii. Ok I'll take a look at the code when the PR is up, and test it with ign-gazebo.
for the samples: i. yes, ogre2 doesn't work yet ii. there is an issue with the gazebo_scene_viewer sample on ign-rendering5: https://github.com/ignitionrobotics/ign-rendering/issues/290. I think that needs to be fixed first before we can test your changes. I would hold off on it for now. Maybe comment on the ticket and add a reminder to port the changes to the sample once its fixed. iii. yes, ogre2 does not work yet iv. yes, ogre2 does not work yet

darksylinc commented 3 years ago

Great!

I probably will have then to merge with SetCameraPassCountPerGpuFlush = 0 as default at least until ign-gazebo and ign-sensors get updated so that these modules are PostFrame-aware

iche033 commented 3 years ago

Let me merge ign-rendering5 forward to main first

https://github.com/ignitionrobotics/ign-rendering/pull/350

azeey commented 5 months ago

@iche033 is this fixed?

iche033 commented 5 months ago

yes we can consider this as complete.

gazebosim / gz-rendering

Replace renderOneFrame for per-workspace update calls #323

Current behavior

Desired behavior

Exceptions

Reason

Impact

Risks

Implementation suggestion

Progress

Blocking problems:

With cameraPassCountPerGpuFlush = 6

With cameraPassCountPerGpuFlush = 0

Update

TL;DR changes for me to implement