Climate-Data-Science / Climate-Similarity-Metrics

Which similarity metrics are the most helpful to understand climate
0 stars 2 forks source link

Detrend and preprocess data before, not after deriving index #10

Closed pawelbielski closed 4 years ago

pawelbielski commented 4 years ago

It looks like that the Nature paper preprocessed all the data before further analysis:

As a preprocessing step we have removed 1056 grid points (out of total 10512) with missing values or gaps, hence in total 9456 grid points are considered. To avoid artefacts due to autocorrelation and seasonality, we removed the seasonal cycle and normalized the data. Specifically, we calculate for every month (i.e., separately for all Januaries, Februaries, etc.) the long-term mean and standard deviation. Each data point is then normalized by subtracting the mean and dividing by the standard deviation of the corresponding month at that grid cell. This normalization significantly reduces temporal autocorrelation in the time series.

pawelbielski commented 4 years ago

@pierretoussing

The following code in deseasonalize_map() can easily be vectorized, because every point can be deseasonalized seperataly, i.e. you could perform all the steps by vectorized operations. Currently the function deseasonalize_time_series() is called 512 256 37 = 5 million times, what drastically reduces the code's usability.

for level in range(len_level):
    for lat in range(len_latitude):
        for lon in range(len_longitude):
            time_series = map_array[:, level, lat, lon]
            deseasonalized_series = deseasonalize_time_series(time_series, period_length)
            deseasonalized_map[:, level, lat, lon] = deseasonalized_series
pawelbielski commented 4 years ago

Well done!