Closed albert-ying closed 7 months ago
This feel more like something that should be added to the quality report function https://github.com/bio-learn/biolearn/pull/55
Potentially, but note that the missingness is different for different clocks as each clock uses a different set of CpG sites.
Yes that's a good point. Perhaps we need some kind of metadata output from model runs as you suggest. I wouldn't want to pollute the clock output with it.
I would prioritize this as this is a very important metric for evaluating whether the clock output is reliable. Let me know if you need any help!
The updated version of https://github.com/bio-learn/biolearn/pull/55 should allow you to get this information
We should add a
missing_perc
column to theclock.predict()
output, to show how many percent of CpG sites are missed in each sample in raw data before imputing. Potentially we need also print warning message when the missingness is above 20%.