MorganLevineLab / PC-Clocks

Code for the calculation and implementation of the PC Based epigenetic clocks
48 stars 11 forks source link

How do I filter the CGmap files to get ~5.5million sites as described in the paper? #12

Open Mansi-Purohit opened 10 months ago

Mansi-Purohit commented 10 months ago

Hi,

I am trying to create the input data for TrainPCClocks.R script using the processed data uploaded to GEO: GSE161141. I am having trouble filtering the sites as described in the Rat PCA clock paper. The closest I've gotten is ~4.4million sites by filtering coverage >=10 and col1 by chr 1-20, X, and Y and counting the 80% across samples using col1 and col3 as the unique identifiers of the location of sites.

Is there any more information that can be provided to help explain how the filtering on the cgmap files was done or should be done to get the final 5.5 million sites?

Thanks. Mansi