Non-deterministic image render/frame publication times between runs

osrf-migration commented 7 years ago

Original report (archived issue) by Kenny Sharma (Bitbucket: kennysharma).

Gazebo does not always render/publish the same scene image frames between runs on the same test platform/hardware (even with the RNG seed set on the command line). This appears to be caused by non-deterministic rendering times for each camera image in OGRE.

This may be somewhat related to Issue #1748 (#1748), but I believe it is a distinct issue related to the rendering more than timestamps.

We (Neurorobotics Subproject of the EU Human Brain Project) are using Gazebo 7 (a slightly modified fork of gazebo7_7.2.0) and running headless using gzserver and gzweb for the frontend.

Some of our experiments rely on processing Gazebo generated images to periodically stimulate spiking neural networks. For fully deterministic reproducibility, these images would need to be exactly the same view of the scene during each simulation run on any hardware platform. Some of the experiments are using really fine grain, pixel level values to generate stimulus.

While debugging this issue, the following was observed:

the scene time for the first frame rendered in sensors/CameraSensor.cc can vary by 1 simulation tick, so the first frame is not guaranteed to be exactly the same rendered scene
the actual rendering time for the Ogre::RenderTarget in rendering/Camera.cc is non-deterministic even on the same hardware in subsequent runs, which impacts the actual scene time / render time of the subsequent frames

From my understanding and debugging in rendering/Scene.cc, the rendering scene is only timestamped/updated when the blocking rendering events are completed - while the physics data is correctly updated every tick. This means that while the physics side is deterministic at each tick, the rendering side is not by design and cannot be guaranteed.

Is there any way to conceivably guarantee frame perfect image production in simulations? We are setting a defined rate in the SDF for camera sources (which led me to Issue #1748 (#1748) initially). Am I misunderstanding the issue (or have I failed to make it clear what the issue is from our perspective)?

I imagine a solution would require the simulation control loop to be exactly aware of the scene times at which a Camera sensor should publish data, pausing the physics control loop until rendering is complete, and then resuming. This would guarantee exact replication on different hardware at the cost of a smoothly ticking physics simulation.

I understand our use case is extremely specific, so I'm open to discussion if this is not technically a Gazebo issue but something we could potentially fix down the road in our own fork.

osrf-migration commented 7 years ago

Original comment by Silvio Traversaro (Bitbucket: traversaro).

(Disclaimer: I am not an OSRF developer).

I think this kind of behavior is due to https://osrf-migration.github.io/gazebo-gh-pages/#!/osrf/gazebo/issues/1721/making-physics-wait-for-sensor-updates (#1721) .

A proper synchronization mechanism between the sensor thread and the physics thread should solve this.

Other related issues/comments:

osrf-migration commented 7 years ago

Original comment by Kenny Sharma (Bitbucket: kennysharma).

Thanks for the insightful links Silvio. This does definitely seem related to #1721 (#1721) and #1966 (#1966) is also likely very relevant since we step/pause the simulation in a similar manner.

osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).

there is a pull request which I think may address this issue. I have not had time to test it yet.

osrf-migration commented 7 years ago

Original comment by Kenny Sharma (Bitbucket: kennysharma).

Thanks for the quick response Ian, I'll try to test that pull request on a few different hardware platforms over the next couple of days.

We don't build any of the ignition libraries locally so I will likely just change any references to explicit tolerance comparisons as mentioned in the pull request. We do build sdformat, but I don't see a need for our platform to include backwards compatibility (yet).

osrf-migration commented 7 years ago

Original comment by Kenny Sharma (Bitbucket: kennysharma).

Following up here, I backported the pull request with minimal changes to our Gazebo 7 build (the ignition changes were already merged so we are using them by default). After debugging it does indeed seem as though the frames are being published at deterministic times, but unfortunately there is still some other non-determinism in our setup. I'll continue to investigate and let you know if there's anything Gazebo related.

osrf-migration commented 7 years ago

Original comment by Ian Chen (Bitbucket: Ian Chen, GitHub: iche033).

great, thanks for reporting back. If that PR works, we should try and merge it. It needs tests first.

gazebosim / gazebo-classic

Non-deterministic image render/frame publication times between runs #2293