DynamicGravitySystems / DGP

Dynamic Gravity Processor
http://dgp.readthedocs.io/en/develop/
Apache License 2.0
7 stars 4 forks source link

Bug: convert_gps_time float error #85

Open bradyzp opened 5 years ago

bradyzp commented 5 years ago

Bug Report

Issue: convert_gps_time function (dgp/lib/time_utils.py) is returning a pandas TimeDelta with a floating point rounding error when using the 'datetime' format of convert_gps_time.

Relevant Module Versions

Sample Output

df =  pd.read_csv('tests/sample_data/test_data.csv', engine='c')
df.columns = ['gravity', 'long_accel', ..., 'gps_week', 'gps_sow']
>>> df
          gravity  long_accel    ...     gps_week   gps_sow
0    27363.304837   30.766637    ...         1941  312030.8
1    31096.986723   30.953845    ...         1941  312030.9
2    35220.629838   31.162258    ...         1941  312031.0
3    39144.264894   31.388331    ...         1941  312031.1
4    42382.632122   31.624970    ...         1941  312031.2
5    44654.293734   31.850109    ...         1941  312031.3
6    45782.125209   32.048602    ...         1941  312031.4
7    45741.905658   32.216090    ...         1941  312031.5
>>> dt = convert_gps_time(df['gps_week'], df['gps_sow'], format='datetime')
>>> dt
0     2017-03-22 14:40:30.799999952
1     2017-03-22 14:40:30.900000095
2     2017-03-22 14:40:31.000000000
3     2017-03-22 14:40:31.099999905
4     2017-03-22 14:40:31.200000048
5     2017-03-22 14:40:31.299999952
6     2017-03-22 14:40:31.400000095
7     2017-03-22 14:40:31.500000000

It seems that for any sub-second value other than .0 or .5 a rounding error is being introduced by pandas to_timedelta function. The specific line where the issue is introduced appears to be:

 elif format == 'datetime':
        return datetime(1970, 1, 1) + pd.to_timedelta(timestamp, unit='s')

Resolution

A naive fix is to simply use .round('ns') on the pd.to_timedelta result, this seems to fix the issue, rounding off the ~50 ns of error on the timedelta result, testing will need to be done to verify this doesn't impact data ingestion. e.g.

 elif format == 'datetime':
        return datetime(1970, 1, 1) + pd.to_timedelta(timestamp, unit='s').apply(lambda td: td.round('ns'))
cbertinato commented 5 years ago

Interesting. These rounding issues keep popping up in pandas. I'll put up a PR for this shortly.

bradyzp commented 5 years ago

Excellent thanks Chris.

I noticed there was an error in AppVeyor with your recent push, it seems there's an unrelated installation issue with matplotlib 3.0 vs matplotlib 2.2.3 missing the freetype and png header/libraries. We're not really directly using mpl at the moment so I'm inclined to ignore this and investigate in depth later to resolve (or pin mpl to the last working version 2.2.3).

cbertinato commented 5 years ago

Sounds good re the Appveyor issue. Sorry this simple fix is taking so long. Turned out the fix was a little trickier than I thought, but I think the latest push, though not ideal, should do it for now. I'm going to submit an issue to the pandas repo as well and I've been looking at it there. The issue is in a function that is used fairly widely, but the fix should be fairly simple.