Open chapulina opened 3 years ago
This reminds me of some of the discussions in ros2 about serialization and native data types. Is there any understanding we can draw from that experience?
Let's start by removing the usage of SDF types inside components in favor of Ignition Messages.
This should help with the scripting effort too, so we can postpone adding scripting interfaces to SDFormat classes.
Let's start by removing the usage of SDF types inside components in favor of Ignition Messages.
I don't think that's what we want to do. Generally, I think we should avoid making protobufs part of any of our public APIs if it can be avoided.
I would like to see a common struct
as our actual component with appropriate conversion functions to both sdf
and ignition::msgs
.
we should avoid making protobufs part of any of our public APIs if it can be avoided.
That's because of https://github.com/ignitionrobotics/ign-msgs/issues/113 (also pointed out above), right? Yeah, that's a good point.
I would like to see a common struct as our actual component with appropriate conversion functions to both sdf and ignition::msgs.
My main concern with this is that adding another type feels counterproductive when the goal is to reduce the number of types we have.
I'm more inclined to figuring out a way in which we can make protobuf work for us instead putting a lot of effort into designing yet-another-collection-of-types.
I think based on the recommended guidance from the Protobuf C++ Tutorial, we're going to end up with another collection of objects one way or another.
Protocol Buffers and Object Oriented Design Protocol buffer classes are basically dumb data holders (like structs in C); they don't make good first class citizens in an object model. If you want to add richer behavior to a generated class, the best way to do this is to wrap the generated protocol buffer class in an application-specific class. Wrapping protocol buffers is also a good idea if you don't have control over the design of the .proto file (if, say, you're reusing one from another project). In that case, you can use the wrapper class to craft an interface better suited to the unique environment of your application: hiding some data and methods, exposing convenience functions, etc. You should never add behaviour to the generated classes by inheriting from them. This will break internal mechanisms and is not good object-oriented practice anyway.
This makes me think that the ign-msgs types themselves should be some wrapper around the protobuf types, rather than the protobuf types themselves.
Another issue is that ign-msgs
that contain other ign-msg
types are heap allocated, and so defeat the purpose storing components in contiguous memory in the ECS.
Another issue is that ign-msgs that contain other ign-msg types are heap allocated, and so defeat the purpose storing components in contiguous memory in the ECS.
This shouldn't be an issue from Fortress anymore, because we're not trying to allocate components contiguously in memory anymore since #856.
It would be nice to revisit the topic of storing components in contiguous memory, that was one of the big motivations for moving to ECS. If using messages would block us from achieving that goal, then I'd suggest against their use in components.
Based on my reading of this thread, the reason to use protobuf in components is serialization performance. The reason to use SDF is to avoid ABI issues, follow guidance from the protobuf developers, and leave open the door for improving our cache performance.
It seems that SDF is the safer option. Open questions I have are:
Is SDF serialization performance significant?
If it's being serialized to XML and back, I would say that the performance is not great.
Will the serialization issue be resolved for the majority of users by the single process GUI+server work?
As we still intend to support both use cases, it's hard to predict which portion of users will be impacted. People who are doing analysis or simulation on a single computer probably won't incur a ton of serialization, but any interaction with a remote or cloud-based simulation instance would have issues.
The reason to use SDF is to avoid ABI issues, follow guidance from the protobuf developers, and leave open the door for improving our cache performance.
Unfortunately, most of the SDF DOM objects also make use of the PIMPL pattern which likely will block any sort of cache performance improvement. The best strategy would be to come up with another very basic (read: c-struct) representation of the data that could be either serialized to SDF DOM objects or protobuf messages. This would allow the ECS to operate on the most efficient in-memory version of the data.
Another issue is that ign-msgs that contain other ign-msg types are heap allocated, and so defeat the purpose storing components in contiguous memory in the ECS.
I know I brought this up, but as of https://github.com/ignitionrobotics/ign-gazebo/pull/856, we don't store components in contiguous storage, so this may not be relevant anymore.
It would be nice to revisit the topic of storing components in contiguous memory, that was one of the big motivations for moving to ECS.
I know it sounds backwards not to use contiguous storage anymore, but as detailed on #856, that had been hurting performance, rather than helping it.
It seems that SDF is the safer option.
Most SDF classes were originally designed to be "read-only" and part of a tree. As documented in most of them: Typical usage of the SDF DOM is through the Root object. Some of this may be changing, and there's an ongoing effort to document this here. But ultimately, it doesn't look like SDF classes were designed to be used as independent components that store plain data efficiently.
follow guidance from the protobuf developers
I believe the guidance is that we shouldn't be packaging the generated C++ code. It may be possible to solve this while still using them in components, see https://github.com/ignitionrobotics/ign-msgs/issues/113.
I know it sounds backwards not to use contiguous storage anymore, but as detailed on https://github.com/gazebosim/gz-sim/pull/856, that had been hurting performance, rather than helping it.
I think you could argue that we were never really using contiguous storage correctly because many of our components were using the PIMPL pattern as well. The speedups in contiguous storage would come from using POD types as components.
it doesn't look like SDF classes were designed to be used as independent components that store plain data efficiently.
I agree with this as well. It would be interesting to experiment with POD components at some point, but I don't think it's a high priority at the moment.
I believe the guidance is that we shouldn't be packaging the generated C++ code. It may be possible to solve this while still using them in components, see https://github.com/gazebosim/gz-msgs/issues/113.
This is my interpretation as well.
Some thoughts on using simple POD / structs:
gz-msgs
, because they aren't PIMPLizedgz-math
types, and as such, be used by SDFormat, gz-common
, etc.I've allowed myself to dream a little and think about what would be the ultimate solution to this situation if we had no baggage or backwards compatibility to maintain. The goals are:
Our API across the stack has always been grounded on the SDF spec. We have model, link, collision, visual, etc, entities and components because these are SDF concepts. Our messages mimic SDF types, our sensors, physics and rendering support parameters exposed through SDF.
The canonical SDF description is the XML spec. So here's an idea:
gz::proto
to differentiate from the gz::msgs
that we currently have.All SDF DOM classes are replaced by these gz::proto
classes for all set/get operations. All other SDF functionality is moved to free functions that take in gz::proto
, using the Interface Principle. For example:
sdf::load(const sdf::ElementPtr &_sdf, gz::proto::World &_worldMsg);
sdf::toElement(const gz::proto::Link &_linkMsg)
std
types, if not possiblegz::math
types, if not possiblegz::proto
types, if not possiblestruct
I believe the guidance is that we shouldn't be packaging the generated C++ code. It may be possible to solve this while still using them in components, see https://github.com/gazebosim/gz-msgs/issues/113.
If we use gz-msgs as components, wouldn't that mean the generated C++ code is now part of our public API? If the goal is to move away from shipping generated protobuf code, I don't think we would be going in the right direction if we use them as components.
Consider a user application that generates it's own gz-msg
based C++ code. Even though this application is using the same proto
files, there is no guarantee that the generated classes would be ABI compatible with the generated code used by gz-sim (they could be using completely different versions of protoc
). Also, when linking against gz-sim to get access to EntityComponentManager
, it would probably have link time errors because there would be multiple definitions of the gz-msg
generated classes.
If we use gz-msgs as components, wouldn't that mean the generated C++ code is now part of our public API? If the goal is to move away from shipping generated protobuf code, I don't think we would be going in the right direction if we use them as components.
I agree, it seems like a step in the wrong direction. Fundamentally, protobuf is a serialization format, it's generally going to be sub-optimal to use for anything other than serialization.
I know there is hesitance to add another set of structures, but I think about it the way that ROS 2 works. You have a set of structures that the user interacts with (the ROS idl) structs, and you have a set of structures that are used for serialization/deserialization (in some RMW implementations). These two things are generated from the same set of rosidl files, as well as all of the code to migrate between the two.
I think it would make the most sense to have a POD representation and a protobuf generation and conversion code, all generated from the SDF spec. The POD objects could be stored as components, but when you went to publish with ign-transport
, it would JIT convert to the protobuf object, serialize, and publish. Additionally, you could have generated code for sdfdom to these POD structs.
This is also probably closer to capnproto
's design philosophy.
there is no guarantee that the generated classes would be ABI compatible with the generated code used by gz-sim (they could be using completely different versions of protoc)
Maybe this can be worked around by enforcing a protoc
+ gz-proto
version combination?
when linking against gz-sim to get access to EntityComponentManager, it would probably have link time errors because there would be multiple definitions of the gz-msg generated classes.
Yeah good point, this may be a deal breaker.
have a POD representation
The problem with POD types is that they can't be extended without breaking ABI. On the flip side, they're better to store in contiguous memory in the ECM. I lean in favor of prioritizing extensibility without breaking ABI rather than in-memory optimization, because the former is something we're actively dealing with all the time, while the latter has a potential theoretical benefit.
have a POD representation and a protobuf generation and conversion code, all generated from the SDF spec.
I like the idea of having another type generated from the SDF spec though, I'd just like it to be PIMPL'ed. The result may look very similar to our current SDF DOM classes.
The problem with POD types is that they can't be extended without breaking ABI. On the flip side, they're better to store in contiguous memory in the ECM. I lean in favor of prioritizing extensibility without breaking ABI rather than in-memory optimization, because the former is something we're actively dealing with all the time, while the latter has a potential theoretical benefit.
If we design this so that components can only be created by factory (i.e., private constructors), maybe we can use a custom memory allocator to create the dataPtr
objects in a way that's optimal. We don't have to do that right now, but that could be something we do in the future if we see the PIMPL memory allocation is hurting performance.
Maybe this can be worked around by enforcing a protoc + gz-proto version combination?
I think this is hard without vendoring protobuf
. I think that protobuf, like many other Google libraries, is pretty aggressive about moving forward with the "live at HEAD" philosophy. Platforms that have rolling package managers (Windows and macOS) will quickly get out of sync with what is available in Ubuntu.
Platforms that have rolling package managers (Windows and macOS) will quickly get out of sync with what is available in Ubuntu.
Yeah that's a good point, but I think the supported combination can be on a platform-basis, since users won't be sharing binaries across those platforms. We can say that we only support the upstream protobuf version on all platforms. That would be whatever comes with Ubuntu. On macOS and Windows, whenever the version changes, we recompile all our bottles / conda binaries to use the latest version, like we already do.
What we wouldn't support are custom protobuf versions coming from a PPA, built from source, or anything like that. Which we already don't anyway.
The problem with POD types is that they can't be extended without breaking ABI.
Breaking ABI is only a problem if the the POD structs are part of the public API.
POD types could fit as part of a collection of interfaces to the same data. SDFormat could be the interface to read and write data to and from files, .proto
could be the interface to serialize/deserialize over a network, POD types could be the interface internal to Gazebo, and PIMPL wrappers of the POD types could be given to users in Gazebo's public APIs and the public API's of the other gz/ignition libraries.
Another thing to keep in mind for this discussion is that we have started supporting custom data in SDFormat files through data in custom xml namespaces. A map of std::any
values (similar to gazebo::physics::Preset) could store arbitrary data for example.
One of the challenges with Ignition's modular architecture is to create data structures and convert between them across all libraries.
Types everywhere
For example, Ignition Math is used by pretty much all libraries, from SDFormat to Rendering and Physics, to store math data types like vectors. But that's not the only place where those types exist. For most Ignition Math types, we also have equivalent Ignition Msgs types, so that the data can be transmitted over the wire.
Sure, that's not a terrible situation, and each class has a different use case. But let's look at something more complex, like a 3D mesh.
We start from sdf::Mesh which stores some basic metadata like file path and scale. That's pretty much the same as what ignition::msgs::MeshGeom holds. But the actual 3D data, like vertices and normals, is held by ignition::common::Mesh, which is used by both Physics and Rendering. But by itself, it doesn't include all the data that those libraries need. Ignition Rendering has a wrapper that adds extra information: ignition::rendering::MeshDescriptor. And Ignition Physics needs more information to be passed alongside
common::Mesh
to its ignition::physics::mesh::AttachMeshFeature.While implementing heightmaps, I created
sdf::Heightmap
(https://github.com/osrf/sdformat/pull/388) to hold metadata, which is similar to ignition::msgs::HeightmapGeom. Made use of common::HeightmapData to store the actual height data, but that doesn't hold any material information. So for Ignition Rendering, I created theignition::rendering::HeightmapDescriptor
class (https://github.com/ignitionrobotics/ign-rendering/pull/180), of which 80% is a direct copy fromsdf::Heightmap
, with an extra field forcommon::HeightmapData
. The physics implementation will probably need to combine the SDF and Common types too.With so many equivalent types, we also need lots of conversion functions between them:
Implications
All these different types affect:
Moreover, there's just too much boilerplate involved. The classes on SDF and Rendering for example use the PIMPL pattern to be extensible, so they need setters and getters for each property, besides the rule of 5 (until we have something like https://github.com/ignitionrobotics/ign-cmake/pull/123). This is all created manually, which causes inconsistencies. All this just to hold some data.
Moving forward
I'm opening this issue to gather some ideas. I think that the situation we have with the math types is pretty good. There are 2 classes to store data, one for operations (
ignition::math
) and one for transmission (ignition::msgs
). How could we achieve an equivalent situation for the other types?