NOAA-CSL / MELODIES-MONET

MELODIES MONET - diagnostic tool for evaluating models against a variety of observations including surface, aircraft, and satellite data all within a common framework
https://melodies-monet.readthedocs.io
Apache License 2.0
19 stars 21 forks source link

Update NaN value from -1 to -999 #204

Open rschwant opened 11 months ago

rschwant commented 11 months ago

Dependency: Move all observation readers to command line interface: https://github.com/NOAA-CSL/MELODIES-MONET/issues/203

1) Update the NaN value from -1 to -999

2) Then remake all of the observational datasets with this updated NaN value.

3) Add a version for MELODIES MONET for creating these observation files into all of the observational readers and the observational output. Add an error, that old datasets without a version number and thereby a -1 NaN value, cannot read be read into MELODIES MONET and they need to be recreated.

zmoon commented 3 days ago

With this approach the recovery of NaN could be automatic when the netCDF is read, so that the YAML file doesn't have to duplicate this info.

rschwant commented 3 days ago

If you can update the code, to just save the data as NaN and then read in the data as NaN without having to add a negative number intermediary, than this seems best and least error prone. So I like this plan.

zmoon commented 3 days ago

save the data as NaN

Well with the int packing you can't exactly do this, since NaN is only for floats. But having it automated without the user needing to do something does seem like a step up.

But of course we could instead just do zlib compression without the int packing, which still helps a lot for the obs, since the obs files usually have a lot of missing data, which is easily compressed.

rschwant commented 3 days ago

I see, can we still have the intermediate NaN value be -999. Some PTR instruments and likely others have real values at -1 and I don't want to accidentally NaN them.

zmoon commented 2 days ago

Some PTR instruments and likely others have real values at -1 and I don't want to accidentally NaN them.

Not sure how it could happen accidentally, since we would have to create reader and reader-to-nc tool first before any possible NaN filling in the dataset, and we could be careful to avoid such issue. And in the YAML, the setting is pretty clear and controllable by the user.

I think at the least we can have the intermediate NaN value used for the int packing be configurable. Or even picked algorithmically, e.g. slightly less than the actual min of the data.

rschwant commented 2 days ago

I like that idea, how about we pick the intermediate NaN value algorithmically, e.g. slightly less than the actual min of the data. Then we do not have to carefully check this for every variable and we know that we are not NaNing negative values that are meant to be negative.