MeteoSwiss / dvas

Data Visualization and Analysis Software for the UAII 2022
https://meteoswiss.github.io/dvas/
GNU General Public License v3.0
3 stars 0 forks source link

Non-integer timesteps in RS41 GDPs *after* ingestion by dvas #179

Closed fpavogt closed 2 years ago

fpavogt commented 2 years ago

Describe the bug Some of the RS41 (at least) GDP timesteps are no longer exact integer seconds when we get them out of the DB.

This should not be: where do we get those rounding errors from ?

@modolol: do you think this could be related to using those annoying timedelta64[ns] that save everything in nanoseconds ?

We need to dig further to figure out what is going on here ...

fpavogt commented 2 years ago

Problem identified: RS41 GDPs store time info as "float" which results in a "float32" type in Python, whereas iMS-100 GDPs store time inf as a "long", which results in a "int64" in Python. When converting to nanoseconds for ingestion by Pandas, the former leads to floating point precision issues. The latter doesn't.

How do we fix this "cleanly" ?

Edit: things are actually more complicated. The floating point arises when the "float32" get converted to "float64" sometime before/when they are ingested in the db (where exactly is not clear). But the real reason we see this problem is because we were inserting nanoseconds in the db, and as it turns out:

In [131]: np.array([5000e9], dtype='float32').astype('float64')
Out[131]: array([4.99999991e+12])

So we "fix" the problem by inserting seconds in the DB, for which the "float32" to "float64" conversion works fine:

In [132]: np.array([5000e0], dtype='float32').astype('float64')
Out[132]: array([5000.])

Doing so, in turn, required using pd.to_datetime() when creating the Profile DataFrame, to correctly/safely handle the conversion to dtype='timedelta64[ns]', which is enforced for all time deltas in Pandas.

For now, this conversion via pd.to_datetime() is hardcoded to assume 's'. That specific point is discussed in #194.