bio-learn / biolearn

Machine learning tools for biomarker analysis
Other
47 stars 17 forks source link

Add missing percentage column #58

Closed albert-ying closed 7 months ago

albert-ying commented 9 months ago

We should add a missing_perc column to the clock.predict() output, to show how many percent of CpG sites are missed in each sample in raw data before imputing. Potentially we need also print warning message when the missingness is above 20%.

sarudak commented 9 months ago

This feel more like something that should be added to the quality report function https://github.com/bio-learn/biolearn/pull/55

albert-ying commented 9 months ago

Potentially, but note that the missingness is different for different clocks as each clock uses a different set of CpG sites.

sarudak commented 9 months ago

Yes that's a good point. Perhaps we need some kind of metadata output from model runs as you suggest. I wouldn't want to pollute the clock output with it.

albert-ying commented 9 months ago

I would prioritize this as this is a very important metric for evaluating whether the clock output is reliable. Let me know if you need any help!

sarudak commented 8 months ago

The updated version of https://github.com/bio-learn/biolearn/pull/55 should allow you to get this information