Expose the Entity Component Manager

diegoferigo commented 4 years ago

I remember a @chapulina's comment long time ago in BitBucket stating that in the long term the pointer to the EntityComponentManager could have been exposed by the Server (if you find where, please link it here, I couldn't find it).

Now, in https://github.com/ignitionrobotics/ign-gazebo/issues/51#issuecomment-656913429, there's some discussion going on about the design of APIs that keep the ECS hidden in view of the fact that it's not exposed to the user yet.

This being said, what I want to discuss is the possibility to provide to the users the possibility to operate on the ECM. This is pivotal when using the C++ APIs of the simulator, since downstream users in this way can read and write with great freedom all simulated data without the need to pass by the official APIs, that sometimes could be limited (and I'm thinking of the Model, Link, Joints, ... classes). With great power comes great responsibility, as usual.

The current workaround we're using for 1+ year now is a plugin that stores the pointer in a singleton. Useless to tell that I don't like this dirty workaround, is like entering a house from the fireplace (and is not even winter :santa:). Having the following would really be great:

ignition::gazebo::ServerConfig config;
config.SetSdfFile("empty.sdf");
config.SetUpdateRate(1000);

ignition::gazebo::Server server(config);
ignition::gazebo::EntityComponentManager* ecm = server.Ecm(/*_worldIndex=*/0);

I'm not a big fan of raw pointers, but in this case I don't think that sharing the ownership would be a great idea. Though, in my experience I sometimes had issues during termination when a plugin in its destructor has to access the ecm through this pointer and the ecm had already been deleted (consequence of the limitation that prevents plugins to be unloaded #113). In this case the raw pointer should be used with care.

chapulina commented 4 years ago

I remember a @chapulina's comment long time ago in BitBucket stating that in the long term the pointer to the EntityComponentManager could have been exposed by the Server (if you find where, please link it here, I couldn't find it).

I know what you're talking about, but I also don't remember where it was.

With great power comes great responsibility, as usual.

Right. Currently, manipulation of the ECM is done primarily from systems in the PreUpdate - Update - PostUpdate cycle. The ECM is mutable in the first 2 callbacks, but it is const on the 3rd one. Currently, all systems' PreUpdate / Update calls are run in series, so we're sure that 2 systems can't be modifying the ECM at the same time. While the read-only PostUpdate call is run in parallel.

This is done in to enforce protection of the ECM data through the API itself. But we're also guilty of breaking that API contract and keeping raw pointers to the ECM :see_no_evil: , for example:

https://github.com/ignitionrobotics/ign-gazebo/blob/6a3f5dd95e2bad9d82af60ca6c91a0456caa0666/src/systems/user_commands/UserCommands.cc#L214

:point_up: Don't do this at home.

This is the main reason why we haven't made it a singleton or exposed it in any other way. Before we do, ideally we'd have a way to prevent that downstream users misuse the API.

It would be great if we could lock the ECM for writing at specific times during simulation and assure that only one object, be it a system or otherwise, can hold that lock at a time. We've also talked about only exposing ECM APIs to queue changes, and let the changes themselves only be performed internally by the ECM.

I'm open to ideas on how to safely expose that pointer.

diegoferigo commented 4 years ago

Thanks @chapulina for providing this exhaustive zoom-out, it's very helpful!

This is done in to enforce protection of the ECM data through the API itself. But we're also guilty of breaking that API contract and keeping raw pointers to the ECM

I actually faced this problem few months ago when I tried to implement a non-deterministic stepping of the Python APIs of a downstream project https://github.com/robotology/gym-ignition/pull/158#issue-385050065.

The deterministic stepping, instead, does not suffer of this problem. The concurrent execution of the PostUpdate phase, as you pointed out, could create problems if the pointer to the ECM is used also to modify the underlying data. It never occurred in my experience, but I totally agree that these problems could arise depending on how downstream users implement their own plugins.

Note that, in case of need, even in the PostUpdate phase, no one prevents downstream users to do:

void MyPlugin::PostUpdate(
    const ignition::gazebo::UpdateInfo& /*info*/,
    const ignition::gazebo::EntityComponentManager& ecmRO)
{
    // Woops :)
    auto& ecmRW = const_cast<ignition::gazebo::EntityComponentManager&>(ecmRO);
    ecmRW.CreateEntity();
    // ...
}

Click to expand an example

I have a very concrete case to bring as an example that requires a similar workaround: applying a delay to a components, e.g. to implement a delay to a PID reference. In this context, the user stores the reference in a component, then in the `PreUpdate` phase this new reference is queued somehow and replaced by a previous value (since it has to be delayed), and finally in the `PostUpdate` phase the original reference has to be restored (because the user does not provide a reference every time, and the previous reference should be kept valid). As you can see, the `PostUpdate` phase has to change the component.

This is the main reason why we haven't made it a singleton or exposed it in any other way. Before we do, ideally we'd have a way to prevent that downstream users misuse the API.

Due to these tests, I already tried to think how this could be handled, but I couldn't find any transparent solution from downstream point of view (transparent = don't need to know anything about synchronization logic). I think what you wrote were the only two solution that I already had in mind:

Add new methods that optionally lock the ECM with RAII-style (similar to the with statement in Python)
Implement a queueing logic under the hood (I guess it would require a substantial work compared to the previous)

This is the main reason why we haven't made it a singleton or exposed it in any other way.

A final comment on this: please don't go with singletons :) Singletons + static libraries + plugins = dragons.

diegoferigo commented 3 years ago

Could be fixed by https://github.com/ignitionrobotics/ign-gazebo/pull/793

chapulina commented 3 years ago

The TestFixture class introduced in #926 may address the needs of this issue. The ECM is exposed not as a pointer, but through update callbacks.

diegoferigo commented 3 years ago

xref https://github.com/ignitionrobotics/ign-gazebo/pull/869#issuecomment-958116341

peci1 commented 3 years ago

I think that exposing the ECM as a method of Server is not such an issue:

If you develop a Gazebo plugin, you do not have access to the Server instance, so you cannot go wrong. Moreover, nothing stops you from saving the mutable reference to ECM in PreUpdate and misusing it later.
On the other hand (and here my view can be limited), if you have access to the Server instance, it is probably from a test or from another specific use case where you control the stepping. And if you control the stepping, it should not be such an issue to know when systems can be modifying the ECM and when it should be safe to access it. If I understood correctly the above mentioned problem in gym-ignition with async stepping, the update callbacks would probably be a better solution than direct manipulation with ECM from the (unsynchronized) outside.

Another idea is returning the reference to a thread-local pointer. That would make sure the users can do whatever they want from the same thread which runs the server (no async access there), and would prevent users from using it asynchronously. But It might be a pretty unexpected behavior for users that just see some getECM() function and use it without reading the docs. And I'm not sure what is the plan when there are (will be) multiple worlds in one server - if each of them would still be run in series or if it is expected to have a processing thread for each world (that would make the thread-local idea complicated).

gazebosim / gz-sim

Expose the Entity Component Manager #248