correlating across both intra-seasonal and inter-seasonal variability ('subseasonal mode')

semvijverberg commented 1 year ago

The current implementation is forcing the RGDR().fit method to correlate across the anchor_year dimension (1 datapoint per year). I would like to flatten the anchor_year and i_interval year in order to correlate once using all datapoints (instead of looping over each target).

To solve this issue, I propose to add the 'corr_dim' argument to RGDR().fit. Such that people can also use it more flexibly (even without creating a calendar).

This is what I want to do: ''' target = target_resampled['t2m'].sel(i_interval=(slice(1,6))).stack(time=['anchor_year', 'i_interval']) field = field_resampled.sel(i_interval=slice(-1, 5)).stack(time=['anchor_year', 'i_interval'])

RGDR().fit(field, target, corr_dim='time') '''

Any other comments suggestions? I know this is already supported by https://github.com/AI4S2S/s2spy/blob/825d359e9bc02313a97c222f72699993b611a3fb/s2spy/rgdr/rgdr.py#L274-L276 so should be a very minor change.

BSchilperoort commented 1 year ago

Hi Sem, I tried out code similar to your example, and I was indeed able to get the right correlation out using rgdr.correlation

Modified from rgdr_tutorial.ipynb:

target_data = target_resampled.sel(cluster=3).ts.sel(i_interval=(slice(1,6))).stack(anch_int=['anchor_year', 'i_interval'])
field_data = field_resampled.sst.sel(i_interval=slice(-1, 5)).stack(anch_int=['anchor_year', 'i_interval'])

field_data["anch_int"] = range(field_data["anch_int"].size)
target_data["anch_int"] = range(target_data["anch_int"].size)

corr, p_val = correlation(field_data, target_data, corr_dim="anch_int")

However, it is essential that:

the number of intervals in both the target and the field are the same
the target and field data is properly stacked
users select the right intervals for both field and target

Perhaps it could be better to modify RGDR, rather than the .fit method. As .fit should not really take any input other than data. We then can put the interval handling and dim stacking inside RGDR. How about the following syntax:

rgdr = RGDR(
    target_intervals=[1, 2, 3],  #int or list
    lag=2  # cross correlation lag. Would make precursor_intervals=[-2, -1, 1]
)

semvijverberg commented 1 year ago

Yes definitely agree with all suggestions! Would love that feature!

AI4S2S / s2spy

correlating across both intra-seasonal and inter-seasonal variability ('subseasonal mode') #151