Replacing utils._kurtn with scipy.stats.kurtosis

ME-ICA / mapca

A Python implementation of the moving average principal components analysis methods from GIFT

GNU General Public License v2.0

6 stars 8 forks source link

Replacing utils._kurtn with scipy.stats.kurtosis #17

Closed notZaki closed 3 years ago

notZaki commented 3 years ago

It looks like utils._kurtn(arr) can be replaced with scipy.stats.kurtosis(arr, axis = 0, fisher = True). Any cons with making this replacement?

Few other notes:

The test for _kurtn converts an input with shape (2,3,4) into (3,1). I was expecting the output shape to be (3,4). Is this intentional? https://github.com/ME-ICA/mapca/blob/0d01a2b3d87a12e8f6778fbd549f8e83c05b58c1/mapca/tests/test_utils.py#L93-L99
The docstring for _kurtn says that the kurtosis is calculated along the second dimension, but the code is calculating kurtosis along every dimension except the second dimension. https://github.com/ME-ICA/mapca/blob/0d01a2b3d87a12e8f6778fbd549f8e83c05b58c1/mapca/utils.py#L317-L329

eurunuela commented 3 years ago

As long as the results are identical, I'm happy with the change.

The test for _kurtn converts an input with shape (2,3,4) into (3,1). I was expecting the output shape to be (3,4). Is this intentional?

Given that we're calculating kurtosis for the second dimension, it makes sense that we get an array of size "second dimension".

The docstring for_kurtn says that the kurtosis is calculated along the second dimension, but the code is calculating kurtosis along every dimension except the second dimension.

See how we're iterating on data.shape[1] to calculate kurtosis on the second dimension; i.e. we're calculating a kurtosis value per point in the second dimension.

notZaki commented 3 years ago

See how we're iterating on data.shape[1] to calculate kurtosis on the second dimension; i.e. we're calculating a kurtosis value per point in the second dimension.

The calculation itself is on data[:, i] which is a vector along the first dimension. But I can see how your definition also makes sense. I'm not sure if there's any official definition of what it means to say "f(x) along the nth dimension".

eurunuela commented 3 years ago

We may need to update the docstring. The code was translated from MATLAB, so "second dimension" was really a 2. I guess that the fact that Python starts indexing with 0 may make saying "second dimension" confusing. How about this?

The kurtosis of each vector in x along the second dimension; i.e. shape[1].

notZaki commented 3 years ago

I'm ok with the change, but updating the docstring won't be necessary if the function gets replaced by scipy.stats.kurtosis.

eurunuela commented 3 years ago

Of course. I'll check if we can replicate the results with the scipy function.

Thank you for pointing this out @notZaki !