OpenSimulationInterface / open-simulation-interface

A generic interface for the environmental perception of automated driving functions in virtual scenarios.
Other
267 stars 125 forks source link

Delta-encoding and time models #439

Open doganulus opened 3 years ago

doganulus commented 3 years ago

Describe the feature

This issue discusses the delta encoding feature mentioned here. The idea is to encode just fields which change in an OSI trace file over time. Hence would reduce the file size and better the performance of any simulation tool which uses delta encoding.

This issue illustrates the basic concept and aims to start a discussion/documentation/formalization for this important feature.

Describe the solution you would like

An OSI trace file represents a temporal behavior that maps points in time to fully populated OSI messages. Below I would proceed over examples on how to represent them and I use JSON syntax for simplicity.

  1. Uncompressed (golden) representation

Below is a discrete time behavior as time field implicitly increases 1 time unit per message and each message is fully populated. This would be the golden representation for a temporal behavior but also it is super-inefficient when time unit is chosen small (e.g. nanoseconds as currently set).

{"A": "a1", "B": "b1", "C": "c1"}   //time: 0
{"A": "a2", "B": "b1", "C": "c1"}   //time: 1
{"A": "a2", "B": "b1", "C": "c1"}   //time: 2
{"A": "a2", "B": "b2", "C": "c1"}   //time: 3
{"A": "a2", "B": "b2", "C": "c1"}   //time: 4
{"A": "a2", "B": "b2", "C": "c1"}   //time: 5
{"A": "a2", "B": "b2", "C": "c1"}   //time: 6
{"A": "a2", "B": "b2", "C": "c1"}   //time: 7
{"A": "a1", "B": "b2", "C": "c2"}   //time: 8
{"A": "a1", "B": "b2", "C": "c2"}   //time: 9
  1. Compressing in time

Often we want to use a small time unit when denoting time as complex systems and their components operates on many different timescales. The choice of small time unit creates a lot of repetitive observations that can be handled by compressing the behavior in time, only recording time points when something changed, as follows:

{"time":0, "A": "a1", "B": "b1", "C": "c1"}   //time: 0
{"time":1, "A": "a2", "B": "b1", "C": "c1"}   //time: 1
{"time":3, "A": "a2", "B": "b2", "C": "c1"}   //time: 3
{"time":4, "A": "a2", "B": "b2", "C": "c1"}   //time: 4
{"time":8, "A": "a1", "B": "b2", "C": "c2"}   //time: 8
{"time":9, "A": "a1", "B": "b2", "C": "c2"}   //time: 9

This is what I would call a dense time behavior and repeating messages are skipped and we add a time field (stamp) not to lose where we are in time. Hence, we can jump an arbitrary amount of time at each new message. In practice, we just start at this level and currently how it is done in OSI and elsewhere.

Up to now, this was just an introduction but I also want to show that this is a half-way practice as we can compress more, using delta-encoding.

  1. Compress in time and value (desired feature)

Finally I show the desired delta-encoded representation compressed both in time and value.

{"time":0, "A": "a1", "B": "b1", "C": "c1"}
{"time":1, "A": "a2"}                         
{"time":3,            "B": "b2"}                                              
{"time":8, "A": "a1",            "C": "c2"}    
{"time":9} 

It is important that the meaning of temporal behavior didn't change from (1) to (3) while the size would be significantly smaller in practice. That's easier to read/write/copy/transmit.

Describe alternatives you have considered

In the example above, we apply the concept of forward persistence, that is, once a field set in time, it is interpreted to hold its value until set again. The backward persistence, that is, we interpret the field value holds since last time point, is also possible but the former is chosen because it is the standard behavior of variable assignment in programming languages.

Describe the backwards compatibility

This feature alters the meaning of missing fields in OSI messages ("missing -> as same as before", previously "missing -> default value or null") for all field types. Please see the topic of Field Presence for protobufs and Nullable Scalar for flatbufs. While proto2 support explicit field presence for scalar, we need to handle it explicitly for proto3 and flatbuffers. This may create a backward compatibility issue.

Additional context

pmai commented 3 years ago

I don't want to discourage people from exploring this, however while I am probably the person that originally brought up delta-encoding as an alternative in the context of OSI, I brought it up more as a last resort alternative, not as something to positively persue:

If those avenues do not provide sufficient performance, then I think more radical proposals are needed, with different delta-encoding schemes (and even compression schemes) being on the table.

doganulus commented 3 years ago

@pmai Thank you for your comments. That's great because I think the opposite, this is a conceptual issue (together with other temporal aspects) that should be considered early. Hence I am here to help the standardization efforts to reach a solid foundation to handle/check/generate temporal sequences regardless of the message content.

As you have described, there is a risk for tools, simulators, and other implementations to be hit by performance problems when things were getting more and more complex. Since I don't see any way to contain everything inside monolithic applications, these messages will be transmitted from one place to another. From one tool to another. No matter what. At the core of the simulator, we may not want to delta-encoded messages. That's fine then we don't. But I think OSI, as being an interface to other applications at the periphery, must support it by default.

This is also a great way to represent static information. Indeed a static field is just a regular field, which is never updated during simulation. No need to filter and make a disctinction between static/dynamic. Just set it in the first message and don't update. And who knows maybe I need to update static info on-the-fly...

Said those, this is not even the first use case. Delta-encoding is widely used to reduce the size of logfiles. See Value Change Dump (.vcd) format for digital circuit simulators for example. And we have the same use case here for driving simulations. Whether you do a post-simulation-analysis or replay over logfiles, we want them compact.

Below is how I would approach to the solution in the context of the OSI.

Implementation

First of all, the bulk of the task is already done by protocol buffers and flatbuffers. From the OSI perspective (as an interface), the remaining things we should do is:

  1. Ensure explicit field presence in message definitions for all field types
  2. Set the meaning of missing field to be the same as before or no change

This (1) was default in proto2 but changed in proto3 and then added later here. It is also added to flatbuffers recently here. And (2) is not the business of the serialization format so they keep it as null.

The rest is up to tools, simulators, implementations. They do not have to use missing fields but we make the standard ready when they choose to implement.