0todd0000 / spm1d

One-Dimensional Statistical Parametric Mapping in Python
GNU General Public License v3.0
61 stars 21 forks source link

SPM in Multiple Sclerosis #219

Closed FabiolaMestanza closed 1 year ago

FabiolaMestanza commented 2 years ago

Dear Prof. Pataky,

We are a research group of the University of Milan, and we are applying Statistical Parametric Mapping on gait kinematic data of people with multiple sclerosis. Our aim is to quantify how much the kinematics of each patient differs from that of healthy controls and SPM seems a good candidate, but we have some methodological questions that we would like to address with you. Question 1: We have assessed by a gait analysis system hip, ankle and knee kinematics of people with MS. We have at least four trials for each single patient, and reference kinematics data from healthy subjects matched for gait speed. Now we would like to compare each single person with ms to the reference kinematics values to assess difference from the physiological pattern and to provide a summary score (for each person with MS) to discriminate between more and less impaired patients. We are using the spm1D package. Since we are comparing each patient’s angular waveforms of hip, knee, and ankle flexion with the mean waveforms of healthy subjects. Which kind of approach would you suggest us to run this analysis? Is there a test more suitable for us in the spm1D package?

Question 2: to discriminate between more and less impaired people with ms, it would be useful to summarize the results of the SPM test with a zero-dimensional variable; so that each patient will have a single summary value of gait performance. Would this be methodologically possible? Intuitively, if a supra-threshold cluster is present, lower p-values would indicate greater differences between a given patient and controls. However, spm1d does not calculate p-values for non-significant comparisons - even if we see that this feature is in the to-do list. Alternatively, could the sum of SPM{T2} scores over the one-dimensional continuum be used as a synthetic measure of dissimilarity between a given patient and healthy reference values? Or would you suggest other possible zero-dimensional variables that we can use for our purpose?

Thank you very much for your understanding and your support in advance.

Best regards, Davide Cattaneo, Fabiola Mestanza, Francesco Luciano

We attached data analysis for a typical subject.

HC_KNEE.txt MIGA_ANKLE_R.csv MIGA_HIP_R.csv MIGA_KNEE_R.csv HC_ANKLE.txt HC_HIP.txt `################################

HOUSEKEEPING

################################

Clear workspace

%reset -f

####################################

IMPORT LIBRARIES

####################################

Import os

import os

Import numpy and pandas

import numpy as np import pandas as pd

Import matplotlib for data visualization

import matplotlib.pyplot as plt

Import spm1d to perform SPM

import spm1d

#####################################################

IMPORT AND VIEW DATA FROM PT GROUP

#####################################################

File name of the Pt Hip waveforms

fnamePH = 'MIGA_HIP_R.csv'

Load patients hip data as numpy array

PH = np.loadtxt(fnamePH, delimiter=";")

Plot imported waveforms

plt.plot(PH.T)

File name of the Pt Knee waveforms

fnamePK = 'MIGA_KNEE_R.csv'

Load patients knee data as numpy array

PK = np.loadtxt(fnamePK, delimiter=";")

Plot imported waveforms

plt.plot(PK.T)

File name of the Pt Ankle waveforms

fnamePA = 'MIGA_ANKLE_R.csv'

Load patients ankle data as numpy array

PA = np.loadtxt(fnamePA, delimiter=";")

Plot imported waveforms

plt.plot(PA.T)

#####################################################

IMPORT AND VIEW DATA FROM CTRL GROUP

#####################################################

File name of the Ctrl Hip waveforms

fnameCH = 'HC_HIP.txt'

Load control hip data as numpy array

CH = np.loadtxt(fnameCH, delimiter=";")

Plot imported waveforms

plt.plot(CH.T)

File name of the Ctrl Knee waveforms

fnameCK = 'HC_KNEE.txt'

Load control knee data as numpy array

CK = np.loadtxt(fnameCK, delimiter=";")

Plot imported waveforms

plt.plot(CK.T)

File name of the Ctrl Ankle waveforms

fnameCA = 'HC_ANKLE.txt'

Load control ankle data as numpy array

CA = np.loadtxt(fnameCA, delimiter=";")

Plot imported waveforms

plt.plot(CA.T)

#################################################

PREPARE ARRAYS FOR HOTELLING'S TEST

#################################################

J = number of responses

Q = number of nodes to which the 1D responses have been resampled

I = number of vector components

See: https://spm1d.org/doc/Stats1D/multivariate.html

arr0=np.dstack((CH, CK, CA)) arr1=np.dstack((PH, PK, PA))

Check that dimensions of arr0 and arr1 are correct (J Q I)

print(arr0.shape) print(arr1.shape)

#################################################

PERFORM HOTELLING'S TEST

################################################# T2 = spm1d.stats.hotellings(arr1, arr0) T2i = T2.inference(0.05) print(T2i)

T2i.plot()

z_Test=T2i.z print(z_Test)

print(T2i.clusters)

#################################################

SUM OF T2

#################################################

print(np.shape(z_Test))

print("Sum:") print(np.sum(z_Test))`

0todd0000 commented 2 years ago

Your two questions are both very important, complex questions that are difficult to answer. I have provided preliminary responses below. Since these questions go well beyond specific spm1d procedures, I'd suggest seeking additional support from a statistician and/or from general statistics forums.



Question 1: Which kind of approach would you suggest us to run this analysis? Is there a test more suitable for us in the spm1D package?

This is a difficult question to answer because the appropriate approach for single patient vs. healthy population comparisons depends largely on the precise purpose, the experimental design, and the nature of the healthy population data. The most straightforward case is a large sample of healthy individuals (N>100 or N>1000) in which case you can use the healthy mean as the datum for a one-sample test. A more complex case is a small, random sample of healthy individuals (N<100), in which case a two-sample test may be more appropriate BUT where sample size and power considerations can become complex.

This question is highly complex, and also quite general, pertaining to all types of data including simple scalar data, so I'd recommend posting this question to Stack Overflow's statistics forum, but in terms of simple scalar data like body mass.



Question 2: ...it would be useful to summarize the results of the SPM test with a zero-dimensional variable; so that each patient will have a single summary value of gait performance. Would this be methodologically possible?

It is possible but I suggest against using a test statistic (t value or T2 value) because these values are sample-size dependent; they generally increase as sample size increases, even when the true difference is constant. One option is to use the p-value associated with the maximum test statistic value, which is demonstrated here. However, since those p-values are generally not valid when large (p>0.5), it might be a better idea to use a more common difference metric like the RMSE between two sample means. There are several other difference / similarity metric options like mutual information. There are also several different software packages that directly support difference / similarity metric calculations including See scikit-image and similarity_measures.

bernard-liew commented 2 years ago

Hi all,

Just saw this and was intrigued. I think techniques like the Gait Deviation Index (https://pubmed.ncbi.nlm.nih.gov/18565753/), Dynamic Motor Control Index (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4683117/), and other similar techniques are potentially useful in this instance.

Regards, Bernard

0todd0000 commented 1 year ago

Thank you for the suggestions! They are not directly relevant to current spm1d procedures so I'm closing this issue. If you meant these as feature requests please post them to #45 . Thanks!