Closed semvijverberg closed 1 year ago
Hi Sem, I tried out code similar to your example, and I was indeed able to get the right correlation out using rgdr.correlation
Modified from rgdr_tutorial.ipynb
:
target_data = target_resampled.sel(cluster=3).ts.sel(i_interval=(slice(1,6))).stack(anch_int=['anchor_year', 'i_interval'])
field_data = field_resampled.sst.sel(i_interval=slice(-1, 5)).stack(anch_int=['anchor_year', 'i_interval'])
field_data["anch_int"] = range(field_data["anch_int"].size)
target_data["anch_int"] = range(target_data["anch_int"].size)
corr, p_val = correlation(field_data, target_data, corr_dim="anch_int")
However, it is essential that:
Perhaps it could be better to modify RGDR, rather than the .fit
method. As .fit
should not really take any input other than data. We then can put the interval handling and dim stacking inside RGDR. How about the following syntax:
rgdr = RGDR(
target_intervals=[1, 2, 3], #int or list
lag=2 # cross correlation lag. Would make precursor_intervals=[-2, -1, 1]
)
Yes definitely agree with all suggestions! Would love that feature!
The current implementation is forcing the RGDR().fit method to correlate across the anchor_year dimension (1 datapoint per year). I would like to flatten the anchor_year and i_interval year in order to correlate once using all datapoints (instead of looping over each target).
To solve this issue, I propose to add the 'corr_dim' argument to RGDR().fit. Such that people can also use it more flexibly (even without creating a calendar).
This is what I want to do: ''' target = target_resampled['t2m'].sel(i_interval=(slice(1,6))).stack(time=['anchor_year', 'i_interval']) field = field_resampled.sel(i_interval=slice(-1, 5)).stack(time=['anchor_year', 'i_interval'])
RGDR().fit(field, target, corr_dim='time') '''
Any other comments suggestions? I know this is already supported by https://github.com/AI4S2S/s2spy/blob/825d359e9bc02313a97c222f72699993b611a3fb/s2spy/rgdr/rgdr.py#L274-L276 so should be a very minor change.