databrickslabs / tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
https://pypi.org/project/dbl-tempo
Other
306 stars 50 forks source link

Interpolation creates null values in resampled dataframe #320

Open lambertsbennett opened 1 year ago

lambertsbennett commented 1 year ago

I am having an issue with resampling/interpolation that I think must be a simple misunderstanding. I have a signal at 10 Hz and I want to upsample to 100 Hz and fill values with a linear interpolation. My current code is as follows:

interp_df = start_tsdf.resample(freq=f"{1/100} sec", func='mean').interpolate(target_cols=['value'], method='linear', show_interpolated=True)

However this results in a value column that contains only nulls in between actual sensor readings. Is it not possible to upsample a signal and linearly interpolate to fill missing values?

tnixon commented 1 year ago

You should be able to do this @lambertsbennett, if not this is definitely a bug. Could you provide some sample data we could test with?

lambertsbennett commented 1 year ago

@tnixon thanks for the response, I will take a small sample of the data when I'm next at work and provide it as soon as possible!

lambertsbennett commented 1 year ago

sample.csv @tnixon - Hi here are some sample data that I can use to reproduce the error. In this case I am using the signal and the file_name columns as partition columns. What I am thinking could be the problem is that many of the values are identical and it could be that this is a strange case?

tnixon commented 1 year ago

Thanks @lambertsbennett - I will investigate ASAP!

lambertsbennett commented 1 year ago

@tnixon I have also been looking more into this and things are even a bit stranger... with a different signal in the data there is a large time gap and this large time gap is interpolated properly, but the smaller gaps are filled with nulls.

lambertsbennett commented 1 year ago

Hi @tnixon, I was just wondering if there are any updates on this issue?

Thanks!

lambertsbennett commented 9 months ago

@tnixon we eventually completely switched up our interpolation process, but it would be nice in the future to use tempo. Did anything ever come out of this?

tnixon commented 9 months ago

Hi @lambertsbennett, thanks for providing the sample data. I'm sorry we haven't been able to follow up on this yet, but it is still in the queue and valuable to us to figure out what is going on here. I am setting aside some time to look at it in the near term

jplavins commented 2 weeks ago

Hello! I have the same problem where I end up with null values as a result of resampling/interpolation. I've attached the sample data. Some combination of parameters work fine, some return null values (func='ceil' and method='linear', freq = 60seconds). data_sample.csv