MomentsLD / moments

MIT License
10 stars 3 forks source link

constrained compute_ld_stats #193

Open sgravel opened 1 month ago

sgravel commented 1 month ago

FOr Alouette's project, we are interested in computing all pairwise stats in tens of thousands of short windows along the genome, but exluding pairs below 1000bp. Option 1 would be to compute all pairwise distance in python and filter pairs after the fact. But this is too slow.

Option 2 would be to write a cython function that does the double loop. This could be included in the count_genotypes_sparse.pyx, and we could modify the compute_pairwise_stats to include a positional constraint. This would require passing two additional arguments: snp positions, and a threshold.

Option 3 would be to iteratively call compute_ld_statistics with thousands of bed files, which seems inelegant.

Any other ideas?

At the moment I would advocate for 2, and if all agree Alouette can code it up. Do we want this as part of moments.ld?