anbai106 / mlni

Machine Learning in NeuroImaging (MLNI) is a python package that performs various tasks using neuroimaging data.
https://anbai106.github.io/mlni/
MIT License
8 stars 7 forks source link

Balanced data #4

Closed sourdougie closed 3 years ago

sourdougie commented 3 years ago

Hi Junhao, One more question. Just wanted to make sure I've understood the code correctly. By default, HYDRA (for clustering) assumes the data is balanced. In my data, I have more controls than patients, so I should put in false for this argument?

Thanks, Jonah

anbai106 commented 3 years ago

Hi Jonah,

You should leave the argument class_weight_balanced to be True by default. SVM from sklearn will deal with imbalanced data with tricks.

Also, re., the influence of sample size or data imbalance, we have our preliminary semi-simulated results here: preprint. We hope this can offer a guideline for users like you in real applications. This may give you a general hint if your results are reasonable or not.

sourdougie commented 3 years ago

Great, thanks for clarifying!