Intermediate types in serialization

In order to properly support serialization for types that contain an Entity, we need a way to create an intermediate type that converts entities and other non-stable data into a serialization-safe form. The idea is that any type that can be directly serialized has itself as the intermediate type, but types that need preprocessing before being serialized can specify a different type to use for serialization, and can be converted to and from that intermediate representation. This expands on the ideas present in saveload, but should provide a more flexible approach that doesn't require all types to be known at compile time.

Background and Motivation

The initial motivation for this idea is that Entity is not Serialize. Not allowing entities to be serialized directly is a deliberate design decision by the specs developers, and one that I think is entirely reasonable. Entities are meant to be ephemeral, and neither specs nor Amethyst make any guarantees about consistency in the actual entity IDs generated at runtime; That is to say, the same logic object in your game world may have a different entity ID each time you run your game. For cases like networking and prefab saving/loading, it's not practical to try to serialize the entity ID directly, since the entity ID won't necessarily be the same between client and server (in the case of networking).

For the purposes of a read-only editor, it's actually fine for us to serialize the entity directly. Since the editor only needs to know the entity values in order to display components by entity, and it will never try to reuse an entity ID across sessions, there's nothing that can break as a result of using entity IDs directly. In fact, for an editor, it's desirable to be able to see the actual entity IDs being used within an active session of a game for debugging purposes.

However, even with a read-only editor we quickly run into limitations resulting from Entity not being Serialize. The biggest of such issues is that we can't serialize components that have an Entity as a member (and therefore we can't view such components in the editor). As a short-term workaround for this we provide the SerializableEntity type, but that solution is very specific to this crate and only serves to obfuscate the larger problem of finding stable IDs to use in place of entity IDs for serialization.

Current Solution: `saveload`

The currently recommended solution is is the saveload module that specs provides. saveload provides functionality specifying stable "marker" values to be used instead of entity IDs when serializing a group of components. This core concept of allowing the user (or some external system) to provide stable marker values that are specific to the current context works well, however the specific implementation for saveload has some drawbacks that make it unsuitable in my estimation:

Only addresses Entity values for component grouping but still doesn't support replacing Entity values that are used within a component. This means that you still can't use saveload to serialize a component that has an Entity as one of its members.
All component types must be known at compile time because the implementation is provided over a tuple of component types, e.g. you must specify something like (Transform, Light, MyFoo, MyBar) and only those specified component types will be serialized.
Component data is grouped around entities rather than all components of a given type being serialized into a flat array. This setup is useful for things that are meant to be human-facing since grouping an entity's components together is easier to read, but it's an inherently inefficient solution for serializing and storing large amounts of component data.

The proposed solution builds on the approach taken by saveload, but attempts to solve some of the ergonomic and functional issues that it has.

Proposed Solution

Rather than directly using the Serialize impl on a component, we should introduce an intermediate trait SerializeIntermediate which is able to produce a serialization-safe intermediate value:

pub trait SerializeIntermediate {
    type Intermediate: Serialize;

    fn to(&self) -> Self::Intermediate;
    fn from(from: Self::Intermediate) -> Self;
}

Any type that can't be directly serialized (e.g. because it has an Entity member) could instead specify an alternate type that can be used, and the internal serialization mechanism would first convert each instance to the intermediate representation before actually serializing the data.

This solution can work for deserialization as well: As long as the component knows how to convert from the intermediate representation back into the concrete type, then we can handle deserialization by deserializing into the intermediate representation first, then converting back to the main type.

Example: `Parent` Component

The Parent component is a commonly-used component that can't be serialized (and therefore can't be viewed in the editor) today. Using SerializeIntermediate, we can instead convert to a second type that uses a stable marker to describe the hierarchy:

pub struct ParentIntermediate {
    pub entity: EntityMarker,
}

impl SerializeIntermediate for Parent {
    type Intermediate = ParentIntermediate;

    fn to(&self) -> Self::Intermediate {
        let marker = ...; // Lookup marker corresponding to entity.
        ParentIntermediate {
            entity: marker,
        }
    }

    fn from(from: Self::Intermediate) -> Self {
        let entity = ...; // Lookup entity corresponding to the marker.
        Parent { entity }
    }
}

Note that this example does not demonstrate a proper mechanism for how marker values would be generated or mapped to/from entities. See the next section for discussion on how this can be handled.

Generating Intermediate Values

The question remains of how to handle generating marker values and map them to/from entity IDs. I don't think this is something that should be handled by this crate or the serialization library, since different use cases call for different approaches (i.e. prefabs use the index within the file to determine the entity, game networking may require the server to generate unique IDs and send them to the client, the editor may want to use entity IDs directly, etc.).

Instead, I think it would be enough to allow the SerializeIntermediate trait to provide a SystemData type that can be passed in as a parameter to the conversion functions. This would allow the implementation for each type to perform any case-specific conversion that it cares to do. I suspect that this approach is not going to handle all the necessary use cases, but it simple enough to get us started.

The adjusted SerializeIntermediate would be as follows:

pub trait SerializeIntermediate {
    type Intermediate: Serialize;
    type Data: SystemData;

    fn to(&self, data: Self::Data) -> Self::Intermediate;
    fn from(from: Self::Intermediate, data: Self::Data) -> Self;
}

I think it is plausible that there could exist components that cannot be converted into a serializable format without some data loss and therefore cannot implement the from method (e.g. a component might contain a function pointer). So maybe the Intermediate type should only contain the fields that can be shown/edited in the editor. As I see it, that would mean that from is replaced by two methods: an update method that edits an existing component and a new method that is used if a new component was created in the editor. The new method should probably be allowed to fail if the component cannot be created from serializable data.

So it would look like this instead (You can disregard the naming changes, its just what I would call it)

pub trait EditableComponent {
    type EditData: Serialize;

    fn edit_data(&self) -> Self::EditData;
    fn update(&mut self, data: Self::EditData);
    fn new(data: Self::EditData) -> Result<Self, ???>;
}

I'm also not entirely convinced that including SystemData in the trait is necessary. As far as I can tell, just using the entity ids directly should be enough. Prefabs are already serializable and it would probably be good enough to use an auto-implementation for those. Regarding networking and other things, I think we should wait and see whether it actually becomes necessary. It seems like a fairly easy thing to add later that only complicates things now.

I think it is plausible that there could exist components that cannot be converted into a serializable format without some data loss

Agreed! I totally expect that there are going to be types that can only be partially serialized, and therefore can't be reconstructed from serialized data alone. The idea is that you would use the SystemData parameter to access whatever world state is needed to reconstruct the component from its serialized part. The intermediate data should replace any non-serializable data with some kind of serializable placeholder for that data, and then that placeholder can be used to reconstruct the non-serializable data when deserializing.

I'm also not entirely convinced that including SystemData in the trait is necessary. As far as I can tell, just using the entity ids directly should be enough.

The hope is that this solution will be general purpose enough to work for cases other than just serializing structs with entities in them (e.g. it would be nice to be able to see in the editor which entities have a MeshHandle component attached), so I don't think it's sufficient to just use the entity IDs (if I'm understanding your suggestion correctly @mvesterli).

So maybe the Intermediate type should only contain the fields that can be shown/edited in the editor.

This is a good point that brings up an important question: What data needs to be serialized in what cases? For the editor, we want as much data as possible; Even private members that are only implementation details should ideally be accessible for debugging purposes, and any data that's not directly serializable should be represented in some form for the same reason. But what about for networking? For a networked game, it's generally important to send the absolute minimum amount of data, which means potentially having a different intermediate representation that discards non-essential data. I also wonder if the situation is similar for prefabs (and other serialized-to-disk game data), where there's some need for having a different subset of the data serialized.

Having an intermediate representation at least gives us the option to serialize different data than what's in the struct definition, but it's unclear to me if it's sufficient to only have a single intermediate type 🤔

It occurs to me that the design in the draft, where the SerializeIntermediate impl is in charge of converting non-serializable types such as Entity to/from their serializable representation, may be incorrect. In different cases we may well have different intermediate representations, and therefore need different logic for handling the conversion. Using Parent and its Entity member as an example:

In a prefab, the intermediate representation is the index of the parent entity within the prefab file.
For the editor, the intermediate representation is simply the entity ID.
For networking, the intermediate representation is the server-defined stable ID for the parent entity.

We may want to make SerializeIntermediate generic over some "factory" type that can apply different conversion logic depending on the scenario. Unless someone can come up with a really good solution right now, I'll probably punt on this use case and continue with the initial proposal as a first pass. I'll have to think about it some more to figure out what the more robust solution would look like, though.

So there are two cases that the initial proposal fails to address:

Needing different intermediate representations for different use cases (e.g. wanting all the data for the editor but only some for the data for runtime networking).
Needing different approaches for generating the intermediate values for different use cases (e.g. entities are replaced by indices in prefabs, but entity IDs are used directly in the editor).

I'm going to punt on the first issue because I don't have a good understand of what all the potential cases are. The only example I can think of (wanting to only send a subset of the data over the network) could probably be solved by splitting the component into two different component types, so I'm not yet sure this is actually a case the serialization system needs to handle.

For the second case, though, I think I have a solution. We can use a pattern similar to what serde does and pass a "serializer" to the from/to methods, using a visitor pattern to give the serializer a chance to generate the intermediate values based on the context. I'm imagining an IntermediateSerializer trait, where a &dyn IntermediateSerializer trait object gets provided when generating the intermediate values. A rough sketch of the updated API goes as follows:

pub trait IntermediateSerializer {
    map_entity(entity: &Entity) -> IntermediateEntity;
    map_handle<T>(handle: &Handle<T>) -> IntermediateHandle<T>;
}

pub trait IntermediateDeserializer {
    map_entity(intermediate: &IntermediateEntity) -> Entity;
    map_handle<T>(handle: &IntermediateHandle<T>) -> Handle<T>;
}

pub trait SerializeIntermediate {
    type Intermediate: Serialize;

    fn to(&self, serializer: &dyn IntermediateSerializer) -> Self::Intermediate;
    fn from(from: Self::Intermediate, deserializer: &dyn IntermediateDeserializer) -> Self;
}

// Example implementation for `Parent`.
// ------------------------------------

pub struct ParentIntermediate {
    pub entity: IntermediateEntity,
}

impl SerializeIntermediate for Parent {
    type Intermediate = ParentIntermediate;

    fn to(&self, serializer: &dyn IntermediateSerializer) -> Self::Intermediate {
        let entity = serializer.map_entity(&self.entity);
        ParentIntermediate { entity }
    }

    fn from(from: Self::Intermediate, deserializer: &dyn IntermediateDeserializer) -> Self {
        let entity = deserializer.map_entity(&from.entity);
        Parent { entity }
    }
}

The biggest problem I have with this approach at this point is that it limits us to a set of intermediate "primitives", and wouldn't be able to support a user-defined type that needs a custom intermediate conversion. Right now the only types I can think of that need an intermediate representation are relatively fundamental types for the engine, e.g. Entity and resource handles, but I'd be curious to hear example of user-defined types that would need similar treatment.

Maybe I'm not seeing the problem, but wouldn't it be possible to simply use a marker type in the SerializeIntermediate trait and have different implementations for Editor, Network etc.? Then there would be a SyncSystem for each synced marker type. Im not sure the Serializer traits would be necessary in that case either since each of the marker types could implement any necessary conversion helper functions.

trait SerializeIntermediate<'a, T> {
    type Intermediate: Serialize;
    type Data: SystemData<'a>;

    fn to(&self, data: Self::Data) -> Self::Intermediate;
    fn from(Self::Intermediate, data: Self::Data) -> Self;
}

struct EditorSync;

impl EditorSync {
    fn to_intermediate_entity(entity: &Entity) -> IntermediateEntity { ... }
    fn from_intermediate_entity(intermediate: &IntermediateEntity) -> Entity { ... }
}

pub struct EditorParentIntermediate {
    pub entity: IntermediateEntity,
}

impl<'a> SerializeIntermediate<'a, EditorSync> for Parent {
    type Intermediate = EditorParentIntermediate;
    type Data = ();

    fn to(&self, data: Self::Data) -> Self::Intermediate {
        let entity = EditorSync::to_intermediate_entity(self.entity);
        EditorParentIntermediate { entity }
    }

    fn from(from: Self::Intermediate, data: Self::Data) -> Self {
        let entity = EditorSync::from_intermediate_entity(&from.entity);
        Parent { entity }
    }
}

// Similar for other use cases
struct NetworkSync;
impl<'a> SerializeIntermediate<'a, NetworkSync> for Parent { ... }

@mvesterli if I'm understanding your suggestion correctly, then that would require a separate implementation of SerializeIntermediate for each potential serialization use case (e.g. one for the editor, one for prefabs, one for networking, etc.). That seems like a lot of boilerplate, especially since I can already think of three serialization targets and don't know how many more there might be.

Something I do like about that potential approach is that it would allow us specify different intermediate types for different situations, which may end up being necessary anyway. I'll for sure keep this approach in mind :+1:

I saw that you might be considering alternatives to synchronizing this way. But if it is still relevant, I thought of a more flexible way of doing this:

// Like above, using a generic parameter to represent the synchronization use case.
trait SerializeIntermediate<'a, T> {
    type Intermediate: Serialize;
    type Data: SystemData<'a>;

    fn to(&self, data: Self::Data) -> Self::Intermediate;
    fn from(Self::Intermediate, data: Self::Data) -> Self;
}

// The parameter would implement the following trait, which allows `SerializeIntermediate` to
// be implemented for multiple use cases at once, removing the excessive boilerplate.
// This essentially has the same purpose as the visitor pattern in your proposal, 
// except that its passed as a type instead.
trait SyncType {
    type IntermediateEntity: Serialize;
    // No need for the handle to be generic as its going to be serialized anyway.
    type IntermediateHandle: Serialize;

    fn to_intermediate_entity(entity: &Entity) -> Self::IntermediateEntity;
    fn from_intermediate_entity(entity: Self::IntermediateEntity) -> Entity;
    fn to_intermediate_handle<T>(handle: &Handle<T>) -> Self::IntermediateHandle;
    fn from_intermediate_handle<T>(handle: Self::IntermediateHandle) -> Handle<T>;
}

That way its easy to implement SerializeIntermediate for multiple use cases. However the trait can also be implemented for each type individually if specialized behavior is needed.

impl<'a, T: SyncType> SerializeIntermediate<'a, T> for Parent {
    ...
}

This also addresses some of the issues you raised.

Adding new use cases with their own intermediate representations is easy as it only requires adding a type that implements SyncType. Most types will automatically work for this new use case. This crate can be oblivious to it as it does not need to instantiate the visitor.
The user can add their own serialization primitives by making their own trait that extends SyncType and implement it for the existing use cases. All types that do not need the new primitives will continue to work as before.

After doing the initial work in setting up amethyst-dynamic-prefab, I've learned that this pattern is already being used by Amethyst in loading assets. It exists in two forms that I saw specifically:

The Asset trait has an associated type Data that acts as an intermediate format when loading the asset. The loader automatically deserializes the raw asset into an instance of Asset::Data (using the Format specified), and then a system has to call process on the asset storage to convert the intermediate data into its final format.
The PrefabData trait is used to define an intermediate value to when defining a component in a prefab. The raw data is deserialized into the PrefabData instance, which know how to add the corresponding component data to an entity at runtime.

Interestingly, the definition of PrefabData and how it's used is almost identical to the definition of SerializeIntermediate I proposed above.

amethyst / editor-core