CAST-genomics / haptools

Ancestry and haplotype aware simulation of genotypes and phenotypes for complex trait analysis
https://haptools.readthedocs.io
MIT License
18 stars 4 forks source link

snplist support in the `ld` command #245

Open aryarm opened 3 months ago

aryarm commented 3 months ago

It would be nice if the ld command could compute LD between SNPs. For now, I've just been using the haptools API:

import numpy as np
from haptools.data import GenotypesVCF
from haptools.ld import pearson_corr_ld

causal_SNP = "rs429358"

gts = GenotypesVCF.load("tests/data/apoe.vcf.gz")
get_snp_gts = lambda snp: gts.subset(variants=(snp,)).data[:, 0, :2].sum(axis=1)
target_gts = get_snp_gts(causal_SNP)
other_snps = set(gts.variants["id"])
other_snps.remove(causal_SNP)
ld = {vr: np.abs(pearson_corr_ld(target_gts, get_snp_gts(vr))) for vr in other_snps}

# get the top ten SNPs that have the highest LD with the causal SNP
sorted(ld, key=ld.get, reverse=True)[:10]