compneurobilbao / ageml

AgeML is a Python package for Age Modeling with Machine Learning made easy.
Apache License 2.0
5 stars 1 forks source link

Measurement of distributions and correlations #33

Open JGarciaCondado opened 6 months ago

JGarciaCondado commented 6 months ago

The software package measures many correlations and attempts to determine wether distributions come from similar distributions however there are 4 clear improvements that can be done.

  1. When checking wether the distributions of ages are different we are using a simple t-test. This implies gaussianity which is not always the case. This is especially clear when we were using the UKBB that age distributions between male and females were marked as different but were visually similar. A non-parametric test would be more appropriate. Also when checking between several groups in the clinical_groups an anova test would also be appropriate.

  2. We are assuming when checking the correlation between features and age in model_age that there is a linear correlation. However, it is well known that many phenotypes follow a u shape along the life span only being linear at the end of the lifespan. We should also include a check for this as people might look at the whole life span not only the latter stages.

  3. We are showing bar graphs for the Pearson correlation value of factors vs deltas. However, it would also be good to plot the factors vs delta so as to give the user a visual check. The code from features vs age could be reused as it is basically the same concept.

  4. When checking wether the models predict age we are giving some data on a very simple dummy regressor. A more appropriate method would be to randomly permute age to reject the null hypothesis. This is something done by Ye Tian et. al Nature Med 2023.

JGarciaCondado commented 3 months ago

We should also include effect sizes not only p-values.