AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

When/how should RGDR fail if a lag has no significant clusters #86

Closed BSchilperoort closed 2 years ago

BSchilperoort commented 2 years ago

In PR #85 support for analyzing multiple precursor lags w/ RGDR was added. When a lag does not contain any clusters, RGDR will raise an error. However, a discussion was started on when to raise this error. Either immediately once a certain lag was analyzed and did not contain a cluster, or once all lags have been analyzed and one or more lags do not contain a cluster.

There are pro's and con's to either.

Why raise an error immediately:

Why raise an error after processing all lags:

In the current implementation, RGDR will analyze all lags first, and only then raise an error.

semvijverberg commented 2 years ago

I would prefer to raise a single (summary) error after processing all lags.

Another important point is that, for practical use in a pipeline setting, only a warning should be printed.

For example, we have a pipeline (building upon proto), where we predict EU temperature across ~20 target clusters and 12 months (240 target timeseries), searching for precursors at 3 lags. Hence, RGDR is executed for 720 times.

Soil moisture is typical precursor that predominantly finds clusters in summer and not in other seasons. There is nothing wrong with not finding precursors regions, it is important the pipeline never breaks (otherwise constructing such a pipeline becomes very tedious and time consuming. In 99% of the cases, when looping over multiple variables (e.g., SST, SM, z500) the RGDR will find some precursors for some variable(s) and thus a predictions can be made. But even for this 1%, when absolutely nothing is found, the pipeline should not crash 🙅 . Otherwise the idea of building scalable pipelines is lost.

geek-yang commented 2 years ago

Then I think we can simply change errors to warnings, to ensure that a heavy job will not be interrupted by the error.

Changes are made in PR #93.