compneurobilbao / ageml

AgeML is a Python package for Age Modeling with Machine Learning made easy.
Apache License 2.0
5 stars 1 forks source link

Visualization Style with appearance of Outliers #60

Open JGarciaCondado opened 4 days ago

JGarciaCondado commented 4 days ago

It is quite often in many datasets that we find subjects that have outliers. This usually tends to cause the predicted age to be way too high or way too low. In turn, the graphs displayed for age modelling use the min and max of all the age ranges. Hence we end up sometimes with graphs as those attached:

chronological_vs_pred_age_all_all-1 chronological_vs_pred_age_all_all features_vs_age_controls_all

This is kind of good and kind of bad at the same time:

Good because it lets us see that there are outliers in the data. Bad because we can't see the none outliers which is what interests us.

Solutions: Ideally we would want to discard outliers. How can we do this? Well we should at least report somehow that some values are very far from the average (maybe 3SD?) and give the ID so that users can remove them. Alternatively we could set the ranges based on the original ages. However this does not work for visualizing the relationships between features and age.

JGarciaCondado commented 4 days ago

Sometimes if it is only one feature or two that is an outlier the features vs age graphs will look fine but we will clearly see the outlier in the chronological vs predicted age.

age_bias_correction_all_all features_vs_age_controls_all