Check availability of legacy code train/test splitting methods

AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting

https://ai4s2s.readthedocs.io/

Apache License 2.0

20 stars 7 forks source link

Check availability of legacy code train/test splitting methods #52

Closed BSchilperoort closed 2 years ago

BSchilperoort commented 2 years ago

The new train/test implementation relies on sklearn for the splitter classes. These mostly correspond to the legacy code methods (see image below)

@semvijverberg could you describe here what the leave splitter is supposed to do?

semvijverberg commented 2 years ago

leave-n-out is basically a k-fold with shuffle is false and k = total # of years / n.

semvijverberg commented 2 years ago

At some point, it would also be nice to have the functionality to skip n-train years that are adjacent to a test year. Highly autocorrelated timeseries may lead to information leakage from one year to the next. To ensure this effect is reduced/negated, you can remove the years adjacent to test years from the training datesets.

See gap_prior and gap_after arguments in legacy code: https://github.com/AI4S2S/proto/blob/77734930a40b8aaefcf1e390efe0e3ac93b40858/RGCPD/class_RGCPD.py#L312-L314 (edited)

Peter9192 commented 2 years ago

Is this issue closed by #53 ?

BSchilperoort commented 2 years ago

Is this issue closed by #53 ?

I believe #53 will close this issue yes. It seems that all legacy splitting methods are indeed supported by the code in that PR.