Closed lvclark closed 1 year ago
Hi, you can see the python code here: https://github.com/brentp/somalier/blob/master/scripts/ancestry-predict.py that reads the somalier files. Remember that it is only minimal information and not true genotypes. Note that function discards y-sites but you can see the format. Happy to answer any questions.
Wonderful, thank you for the quick reply!
Somalier extracts genotypes from BAMs so much faster than anything I've attempted to write. I would love to be able to use those genotypes in other analysis. (In my particular case, I need principal components to use as pop structure covariates in association analysis, where I am just running the analysis on a few genes and don't want to have to genotype the whole genome.) Could the format of
somalier extract
be documented a little more thoroughly so that someone like me could read those bytes into Python or R and convert them to numeric genotypes?