bbradt / dfncluster

dFNC typically employs K-means clustering. How can methods like GMMs, DBSCAN, etc. improve fMRI results?
4 stars 0 forks source link

Create t-test comparison across clustered features. #56

Open EricMartin827 opened 4 years ago

EricMartin827 commented 4 years ago

To evaluate how well a clustering algorithm produces classifiable features, we need a way to test for statistically significant differences in cluster assignments across classes. Write a t_test function which measures the difference in cluster assignment over all features between healthy control patients and schizophrenic positive patients.

If there are X healthy and Y schizophrenic patients with D features (clusters assigned to time window/interval), then this function will produce a 1-D array of p_values comparing cluster assignment means between the two sets of patients.

bbradt commented 4 years ago

looks good!

It would be cool if you could generalize the T-Test so that we can also apply it to the cluster-centers between classes. For clustering, I get out a set of K cluster centers in COMPONENT x COMPONENT space. If I take instances belonging to only one class within one cluster, and do a two-tailed t-test between these class-specific instances, I should get backed a COMPONENT x COMPONENT significance matrix, that will show us differences within the clusters themselves.

This isn't necessarily useful for informing supervised learning, but it's something we do to evaluate differences between the populations, so it's worth doing if it's not too difficult.