Closed jdkworld closed 11 months ago
Hi @jdkworld ,
Thx for you interest in ruptures
.
This is because of the nan
values in your series ! Indeed, ruptures
expects the user to have handled on its own missing data. If ruptures
has as input series with missing data, then the behaviour is unexpected.
If you remove the missing data, the outputs "looks" fine.
series = dataframe.to_numpy(dtype='float', na_value=np.nan)
print(f"Raw data : shape is {series.shape}")
series = series[~np.isnan(series)]
print(f"After removing the nans : shape is {series.shape}")
algo = rpt.Binseg(model="normal", min_size=12*24*7, jump=12*24).fit(series)
result = algo.predict(pen=100)
rpt.display(series, result)
plt.show()
which outputs
I hope this helps ! Let us know !
Olivier
Hi Olivier,
Thanks a lot for your answer. The signal is a timeseries and I still want min_size and jump to correspond to the correct time period. So just removing the NaNs is no option. As I understand you, I should therefore fill in all missing data so that no NaN values are left and the time interval for each step is constant?
Josien
If you want to keep the timeseries' structure along the time axis, then yes you have to fill the missing values with something.
And here, there are many many strategies (0.0, last known value, randomly draw from the series, mean or median on a particular time window, etc), but it all depends your use case and this is a decision you have to make according to the underlying goal of the task you are trying to solve !
Hope it helps !
Olivier
I have this signal, when I input it into Binseg as Pandas DataFrame, I get the correct breakpoints but when I input it as Numpy Array, it does not find any breakpoints. Am I missing something? Why is the behaviour different? Can it be due to the way in which NaNs are handled in both cases? Also, when I have two the same columns in my dataframe, into breakpoints are found.
signal.csv