AI4S2S / s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting
https://ai4s2s.readthedocs.io/
Apache License 2.0
20 stars 7 forks source link

Reshape the RGDR returned values to be [samples, features] #92

Closed geek-yang closed 2 years ago

geek-yang commented 2 years ago

Currently the RGDR module will return a dataarray with dimensions [cluster_labels, anchor_year] after calling transform (e.g. rgdr.transform(precursor_field)). For most of the popular machine learning packages, e.g. scikit-learn, the output from the dimensionality reduction method is always in the shape [samples, features] (e.g. PCA in sklearn and the models also need input to be organized in this way (e.g. GraidentBoostingMachine in sklearn).

It is nice to have the output from RGDR to be return with the shape [anchor_year, cluster_labels], which is compatible with sklearn models.

geek-yang commented 2 years ago

One more thing, spotted by @semvijverberg, the returned clustered values (dataarray) include coordinate latitude and longitudes, which are actually the center of each cluster. But this information is not explicitly shown to the user. We can add an attribute to the dataarray and mentions that these values are the center of clusters.