diku-dk / bfast

GPU Implementation for BFAST
GNU General Public License v3.0
37 stars 17 forks source link

use bfast for one dimentional time series data #42

Open TiaHao opened 2 years ago

TiaHao commented 2 years ago

Hi,

Thanks for the package, I am just wondering could I apply the algorithm on 1-d time series data? I checked the format of input data which should be (N,W,H) only for image data. Thank you!

mortvest commented 2 years ago

Yes, it is possible. You should create an "image" with only 1 pixel by re-shaping your dataset:

image_data = 1d_data.reshape(-1, 1, 1)

TiaHao commented 2 years ago

Yes, it is possible. You should create an "image" with only 1 pixel by re-shaping your dataset:

image_data = 1d_data.reshape(-1, 1, 1)

I just tried and it works! I would want to follow up on some other questions in case I understood wrongly:

  1. The history period is for training the model, right? and the returned index is numbered for the monitoring period, which means the first date in the monitoring period is indexed as 0.
  2. if the breaks returned only contain 1 non-negative value, does it means only one breakpoint is detected? how could I tune parameters to detect more breakpoints?
  3. are the breakpoints detected for the trend component only or for both trend and seasonal components or for the original time series?
  4. Thank you!

mortvest commented 2 years ago

1) Correct 2) The BFASTMonitor algorithm only detects one breakpoint, if you wish to detect more breakpoints, try using the BFAST algorithm. An experimental prototype for BFAST (python backend only) is available on bfast-experimental branch. There should be an 1-d example in the example directory. 3) In BFASTMonitor, there is no notion of trend/seasonal components. You need to use the BFAST algorithm.

TiaHao commented 2 years ago
  1. Correct
  2. The BFASTMonitor algorithm only detects one breakpoint, if you wish to detect more breakpoints, try using the BFAST algorithm. An experimental prototype for BFAST (python backend only) is available on bfast-experimental branch. There should be an 1-d example in the example directory.
  3. In BFASTMonitor, there is no notion of trend/seasonal components. You need to use the BFAST algorithm.

Thank you! I found the branch and want to test on my data set, but I kept meeting this error: ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).

my y is array([508104., 525865., 556608., ..., 538247., 534160., 553917.]), x is 0 2017-01-01 1 2017-01-02 2 2017-01-03 3 2017-01-04 4 2017-01-05 ...
1456 2020-12-27 1457 2020-12-28 1458 2020-12-29 1459 2020-12-30 1460 2020-12-31 Name: Date, Length: 1461, dtype: datetime64[ns]

Is the format of x not correct?

mortvest commented 2 years ago

x should be a vector of floating point time-steps, e.g. 1 January 2001 would be represented by 2001.0, while 31 December 2001 would be something like 2001.99.

You can use this function to generate such vector:

def r_style_interval(from_tuple, end_tuple, frequency):
    from_year, from_seg = from_tuple
    end_year, end_seg = end_tuple
    n = (end_year - from_year + 1) * frequency
    full_range = np.linspace(from_year, end_year + 1, num=n, endpoint=False)
    real_range = full_range[(from_seg - 1):n - (frequency - end_seg)]
    return real_range

x = r_style_interval((2015, 1), (2016, 24), 24)
TiaHao commented 2 years ago

x should be a vector of floating point time-steps, e.g. 1 January 2001 would be represented by 2001.0, while 31 December 2001 would be something like 2001.99.

You can use this function to generate such vector:

def r_style_interval(from_tuple, end_tuple, frequency):
    from_year, from_seg = from_tuple
    end_year, end_seg = end_tuple
    n = (end_year - from_year + 1) * frequency
    full_range = np.linspace(from_year, end_year + 1, num=n, endpoint=False)
    real_range = full_range[(from_seg - 1):n - (frequency - end_seg)]
    return real_range

x = r_style_interval((2015, 1), (2016, 24), 24)

Thank you so much, however this function can not generate 365 numbers of data points for daily data in a year since the maximum number np.linespace can generate between 2016 and 2017 is 100? would you have other suggestions to represent daily timestep? (would be 20170101,20170102...20201231 works?)

mortvest commented 2 years ago

I don't know if I understood your question completely, but this call:

arr = r_style_interval((2016, 1), (2016, 365), 365)

would return a vector of 365 values:

[2016. , 2016.00273973, ...., 2016.99726027]
TiaHao commented 2 years ago

I found the problem, I write np.set_printoptions(precision=2, linewidth=120) before, and did not restart the kernel, after restarting the kernel, it works exactly as it should be! Thank you so much!

Then I encounter this error again: ValueError: All-NaN slice encountered

sorry to ask so many questions since I am quite fresh to python and bfast...

mortvest commented 2 years ago

Do you have NaNs in your dataset? Can you include the whole error message and the code snippet that you have ran?

TiaHao commented 2 years ago

Hi,

I do not have NaNs in dataset.

I copy the code in bfast_1d_example.py in the bfast-experimental branch, and write the main as: if name == "main": plot("daily data", data, dates, 365, "harmonic")

data: an array contains daily data from 2017-2020, 1461 data points in total dates = np.linspace(2017, 2021, num=365*3+366, endpoint=False)

It first runs and shows: Plotting daily data Writing to 1 Writing to 2 Writing to 3 Writing to 4 Writing to 5 Writing to 6 Writing to 7 Writing to 8 Writing to 9 Writing to 10 Writing to 11 Writing to 12 Writing to 13 ... many matrix and some contain lots of NaNs

Then error information: ValueError Traceback (most recent call last) /var/folders/2j/n5q0yrd16tzd0m321v49wb500000gn/T/ipykernel_73317/2059085243.py in 114 115 if name == "main": --> 116 plot("daily data", data, dates, 365, "harmonic")

/var/folders/2j/n5q0yrd16tzd0m321v49wb500000gn/T/ipykernel_73317/2059085243.py in plot(name, y, x, f, season, level, h, max_iter, nan_clr) 37 y = y.reshape((y.shape[0], 1, 1)) 38 vo = bfast.BFAST(frequency=f, season_type=season, level=level, h=h, max_iter=max_iter) ---> 39 vo.fit(y, x) 40 nans = np.isnan(y).any() 41

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/models.py in fit(self, Yt, ti) 413 414 # fit BFASTMonitor models --> 415 self._model.fit(Yt=Yt, 416 ti=ti) 417 self._model_fitted = True

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/base.py in fit(self, Yt, ti) 119 pix_season_br, \ 120 pix_n_trend_br, \ --> 121 pix_n_season_br = self.fit_single(y, ti, smod) 122 123 trend_global[:, i, j] = pix_trend

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/base.py in fit_single(self, Yt, ti, smod) 209 print("Breakpoints in trend detected") 210 print("Finding breakpoints in trend") --> 211 bp_Vt = Breakpoints(sm.add_constant(ti1), 212 Vt1, 213 h=self.h,

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/breakpoints.py in init(self, X, y, h, max_breaks, use_mp, verbose) 63 64 ## breaks >= 2 ---> 65 SSR_table = self.extend_SSR_table(SSR_table, self.max_breaks) 66 67 opt = self.extract_breaks(SSR_table, self.max_breaks).astype(int)

~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/bfast/bfast/python/breakpoints.py in extend_SSR_table(self, SSR_table, breaks) 110 # map 111 break_SSR = np.vectorize(fun)(pot_index) --> 112 opt = np.nanargmin(break_SSR) 113 my_SSR_table[i - h + 1, np.array((2, 3))] = \ 114 np.array((pot_index[opt], break_SSR[opt]))

<__array_function__ internals> in nanargmin(*args, **kwargs) ~/opt/anaconda3/envs/Milan_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py in nanargmin(a, axis) 497 mask = np.all(mask, axis=axis) 498 if np.any(mask): --> 499 raise ValueError("All-NaN slice encountered") 500 return res 501 ValueError: All-NaN slice encountered Thank you so much!