DigitalArsenal / spacedatastandards.org

Data Standards For Space Data Systems
19 stars 7 forks source link

To NULL or not to NULL #16

Closed TJKoury closed 2 years ago

TJKoury commented 2 years ago

Several messages in draft are considering using null as default when a value is not present. Some issues with using null:

Recommend removing it as a default from all IDLs.

Thoughts? @maberry18 @tskelso

TSKelso commented 2 years ago

Removing the null values is fine, as long as there is a clear way to handle missing data. We handle a variety of data products, such as the SATCAT and space weather data where numeric data may not be available. The fields are typically for doubles/floats and putting in a value of 0.0 is not the same thing as no data.

maberry18 commented 2 years ago

The FlatBuffer doc on schemas (https://google.github.io/flatbuffers/flatbuffers_guide_writing_schema.html) references using null as the default for optional scalars. Ideally that's what we would do for the optional fields in the standards. Per the doc, for non-scalar types like strings the default value is already null, so those are covered.

In the formats specified by the CSSDS standards, KVN and XML, one can clearly distinguish between "not specified" and "0.0". I don't think we want to lose information/flexibility by using FlatBuffers instead of those text formats.

To be fair, for many (maybe most) of the optional fields a value of 0 would have no useful meaning, and could be considered equivalent to not providing it. Mass, drag area, and conjunction probability are examples. But I'm not sure if that is universally true, and I would rather not make assumptions. For optional sets of fields, like position vectors and orbital elements, an individual value could legitimately be zero, You could assume that an individual zero means zero, but if they are all zero then that means not provided. But that does add complexity. Supporting the null default is cleaner.

Which languages don't support it? Are the only options to either not use the nulls or not support those languages?

TJKoury commented 2 years ago

Do want to state at the outset here that the actual serialization is not as important as getting the issue right, and although the IDLs are being parsed into code for use with FlatBuffers, it does not limit the user to only that serialization format.

Here's the thread about this issue. There is a list of the languages that "support" it.

Technically in FlatBuffers all fields are optional, and the absence of a fields should be interpreted as null, thus giving strings and other primitives the default value of null if they are not included. Making scalars technically / functionally optional (and allowing their default to be null as well) is discussed in the above thread.

Agree with you and @TSKelso that the value 0 does not necessarily mean missing data.

Implementation however, is very language specific, for example C++ use expressions that evaluate to the integer literal 0 prior to C++17, which added the class template std::optional.

TJKoury commented 2 years ago

Removing the null values is fine, as long as there is a clear way to handle missing data.

Recommended approach is to omit fields if their value is missing.

maberry18 commented 2 years ago

OK, thanks for the background. We can remove the nulls then.