AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

correlating across both intra-seasonal and inter-seasonal variability ('subseasonal mode') #151

Closed semvijverberg closed 1 year ago

semvijverberg commented 1 year ago

The current implementation is forcing the RGDR().fit method to correlate across the anchor_year dimension (1 datapoint per year). I would like to flatten the anchor_year and i_interval year in order to correlate once using all datapoints (instead of looping over each target).

To solve this issue, I propose to add the 'corr_dim' argument to RGDR().fit. Such that people can also use it more flexibly (even without creating a calendar).

This is what I want to do: ''' target = target_resampled['t2m'].sel(i_interval=(slice(1,6))).stack(time=['anchor_year', 'i_interval']) field = field_resampled.sel(i_interval=slice(-1, 5)).stack(time=['anchor_year', 'i_interval'])

RGDR().fit(field, target, corr_dim='time') '''

Any other comments suggestions? I know this is already supported by https://github.com/AI4S2S/s2spy/blob/825d359e9bc02313a97c222f72699993b611a3fb/s2spy/rgdr/rgdr.py#L274-L276 so should be a very minor change.

BSchilperoort commented 1 year ago

Hi Sem, I tried out code similar to your example, and I was indeed able to get the right correlation out using rgdr.correlation

Modified from rgdr_tutorial.ipynb:

target_data = target_resampled.sel(cluster=3).ts.sel(i_interval=(slice(1,6))).stack(anch_int=['anchor_year', 'i_interval'])
field_data = field_resampled.sst.sel(i_interval=slice(-1, 5)).stack(anch_int=['anchor_year', 'i_interval'])

field_data["anch_int"] = range(field_data["anch_int"].size)
target_data["anch_int"] = range(target_data["anch_int"].size)

corr, p_val = correlation(field_data, target_data, corr_dim="anch_int")

However, it is essential that:

Perhaps it could be better to modify RGDR, rather than the .fit method. As .fit should not really take any input other than data. We then can put the interval handling and dim stacking inside RGDR. How about the following syntax:

rgdr = RGDR(
    target_intervals=[1, 2, 3],  #int or list
    lag=2  # cross correlation lag. Would make precursor_intervals=[-2, -1, 1]
)
semvijverberg commented 1 year ago

Yes definitely agree with all suggestions! Would love that feature!