ghammad / pyActigraphy

Python-based open source package for actigraphy data analysis
https://ghammad.github.io/pyActigraphy
GNU General Public License v3.0
136 stars 25 forks source link

Mismatch between documentation and code for the computation of non parametric variables IS and IV #148

Open achey2016 opened 8 months ago

achey2016 commented 8 months ago

The documentation for IS and IV use formula with uncorrected variance but the code use pandas.Series.var without specifying ddof=0, which by default correct for bias (using ddof=1).

For long recordings the results should be almost the same but for shorter recordings it could lead to slight differences with other tools.

in the documentation for IS

This variable is defined in [1]:

    IS = \frac{d^{24h}}{d^{1h}}

with:

    d^{1h} = \sum_{i}^{n}\frac{\left(x_{i}-\bar{x}\right)^{2}}{n}

where $x_{i}$ is the number of active (counts higher than a predefined threshold) minutes during the $i^{th}$ period, $\bar{x}$ is the mean of all data and $n$ is the number of periods covered by the actigraphy data and with:


    d^{24h} = \sum_{i}^{p} \frac{
              \left( \bar{x}_{h,i} - \bar{x} \right)^{2}
              }{p}

What the current implementation does

    IS = \frac{d^{24h}}{d^{1h}}

with:

    d^{1h} = \sum_{i}^{n}\frac{\left(x_{i}-\bar{x}\right)^{2}}{n-1} = \mathrm{data.var()}

and:


    d^{24h} = \sum_{i}^{p} \frac{
              \left( \bar{x}_{h,i} - \bar{x} \right)^{2}
              }{p-1} = \mathrm{data.groupby([
        data.index.hour,
        data.index.minute,
        data.index.second]
    ).mean().var()}