AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

Enable the overlapping of intervals with `max_lag` #44

Closed geek-yang closed 2 years ago

geek-yang commented 2 years ago

Currently, the implementation of max_lag doesn't allow the overlapping of intervals for the anchor years, which avoids the information leakage. For instance, if the user define a advent calendar in this way

calendar = s2spy.time.AdventCalendar(anchor_date=(11, 30), freq='180d', max_lag = 3)
calendar

Then 2019 will be skipped to avoid information leakage.

i_interval                          0                         1                          2                         3  
anchor_year                                                       
2020         (2020-06-03, 2020-11-30]  (2019-12-06, 2020-06-03]   (2019-06-09, 2019-12-06]  (2018-12-11, 2019-06-09]  
2018         (2018-06-03, 2018-11-30]  (2017-12-05, 2018-06-03]   (2017-06-08, 2017-12-05]  (2016-12-10, 2017-06-08]  

However, for some usecases, the user might allow the overlapping to happen.

For instance,

i_interval                          0                         1                          2                         3  
anchor_year                                                       
2020         (2020-06-03, 2020-11-30]  (2019-12-06, 2020-06-03]   (2019-06-09, 2019-12-06]  (2018-12-11, 2019-06-09]
2019         (2019-06-03, 2019-11-30]  (2018-12-05, 2019-06-03]   (2018-06-08, 2018-12-05]  (2017-12-10, 2018-06-08]  
2018         (2018-06-03, 2018-11-30]  (2017-12-05, 2018-06-03]   (2017-06-08, 2017-12-05]  (2016-12-10, 2017-06-08]  

We should provide an option to the user, to allow them have these overlapped intervals.