arviz-devs / InferenceObjects.jl

Storage for results of Bayesian inference
https://julia.arviz.org/InferenceObjects
MIT License
14 stars 1 forks source link

Handling boolean data correctly with netCDF #25

Closed sethaxen closed 2 years ago

sethaxen commented 2 years ago

The netCDF file format does not support saving boolean data, so all boolean arrays must be converted to integer arrays before saving to netCDF. Similarly, these are then loaded as integer arrays. An example is the diverging array in the sample_stats group, which is expected to be boolean in some functions in Python and Julia ArviZ.

xarray handles this by serializing the boolean data as integer values but adding a dtype='bool' attribute to the variable. Upon deserializing, it checks for this dtype and applies it to the data when loaded. As a result, all examples in https://github.com/arviz-devs/arviz_example_data handle boolean data this way. However, this is not part of the netCDF spec and may be xarray-specific. Perhaps this approach for boolean data should be added to the InferenceData spec and used here to guarantee that downstream functions always handle this data correctly.

Edit: I guess this may not technically be part of the spec, but rather rules for how objects that implement the spec are to be (de)serialized. Should that be part of the spec?