Detrend and preprocess data before, not after deriving index

pawelbielski commented 4 years ago

It looks like that the Nature paper preprocessed all the data before further analysis:

As a preprocessing step we have removed 1056 grid points (out of total 10512) with missing values or gaps, hence in total 9456 grid points are considered. To avoid artefacts due to autocorrelation and seasonality, we removed the seasonal cycle and normalized the data. Specifically, we calculate for every month (i.e., separately for all Januaries, Februaries, etc.) the long-term mean and standard deviation. Each data point is then normalized by subtracting the mean and dividing by the standard deviation of the corresponding month at that grid cell. This normalization significantly reduces temporal autocorrelation in the time series.

[x] Investigate if preprocessing data pointwise before deriving index makes sense, or if the current implementation is correct
[x] If needed, adjust function deseasonalize_time_series() to work with all the dimensions (latitutde, longitude, altitude)
[x] If needed, adopt the presentation for climate scientists #9 and notebook from #2 if needed. Comment on respective issues (e.g. by deleting nonrelevant information with ~~strikethrough~~ and commenting on the bottom of the issue)

pawelbielski commented 4 years ago

@pierretoussing

The following code in deseasonalize_map() can easily be vectorized, because every point can be deseasonalized seperataly, i.e. you could perform all the steps by vectorized operations. Currently the function deseasonalize_time_series() is called 512 256 37 = 5 million times, what drastically reduces the code's usability.

for level in range(len_level):
    for lat in range(len_latitude):
        for lon in range(len_longitude):
            time_series = map_array[:, level, lat, lon]
            deseasonalized_series = deseasonalize_time_series(time_series, period_length)
            deseasonalized_map[:, level, lat, lon] = deseasonalized_series

[x] In order to make sure that your code is correct, create a vectorized solution and compare it with the non-vectorized one. Once you are sure that they both return the same results, replace the non-vectorized function with vectorized one and push it.

pawelbielski commented 4 years ago

Well done!

Climate-Data-Science / Climate-Similarity-Metrics

Detrend and preprocess data before, not after deriving index #10