OpenSenseAction / OS_data_format_conventions

Code and example files to illustrate standard data formats and conventions derived the OpenSense Action
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Error in PWS data NetCDF due to using int16 with too high precision #14

Closed cchwala closed 6 months ago

cchwala commented 6 months ago

While working on https://github.com/OpenSenseAction/pypwsqc/pull/16 @lepetersson and myself found a problem with the NetCDF for PWS data in this repo. We found that some of the high values of the original PWS data are not there anymore, but there are now negative values in the data from the NetCDF 🙈.

Apparently the problems stems from who the data is saved to NetCDF in this notebook with this code


encoding = {
    "rainfall": {
        "dtype": "int16",
        "scale_factor": 0.001,
        "zlib": True,
        "_FillValue": -9999,
        "complevel": 3,
    }
}

The problem is that there is an integer overflow of the int16 at values of 32.767 because of using a precision of 0.001. Since, as far as I understand we need the precision of 0.001 because to original values are at steps of 0.101. Hence, I suggest to use int32 or uint32. Disk space is not a problem here anyway.

See here for info on the value ranges of different integers.

I assume that @maxmargraf did the notebook. Hence, maybe he should fix that. Or maybe also @JochenSeidel?

maxmargraf commented 6 months ago

Wow, I did not assume such precision in PWS data. I'll check with @fenclmar because the OS_data_format_conventions repo is linked to the data convention publication and will change the encoding according to your suggestion.

lepetersson commented 6 months ago

Wow, I did not assume such precision in PWS data. I'll check with @fenclmar because the OS_data_format_conventions repo is linked to the data convention publication and will change the encoding according to your suggestion.

I am not sure we can call it "precision", it is just that Netatmo gives rain observations as multiples of 0.101 mm, which is the default tipping bucket volume for this specific rain sensor. It could perhaps be argued that we just round to one decimal...

JochenSeidel commented 6 months ago

Keeping 3 trailing digits ( I also would not call it precision...) reveals information if the Netatmo rain gauge has been calibrated or not. So I suggest to keep it this way.

maxmargraf commented 6 months ago

Thanks for the info and suggestions! We then keep three digits with a compression to int32.

maxmargraf commented 6 months ago

Closing this as the compression was changed to int32 in d404cd36072fd65c87a735e730926cf6d890ccd3. Thanks again for rasing this issue!

cchwala commented 6 months ago

@maxmargraf thanks for the quick fix 👏

JochenSeidel commented 6 months ago

@maxmargraf thanks for the quick fix 👏

Thanks from my side as well!