FOr Alouette's project, we are interested in computing all pairwise stats in tens of thousands of short windows along the genome, but exluding pairs below 1000bp.
Option 1 would be to compute all pairwise distance in python and filter pairs after the fact. But this is too slow.
Option 2 would be to write a cython function that does the double loop. This could be included in the count_genotypes_sparse.pyx, and we could modify the compute_pairwise_stats to include a positional constraint. This would require passing two additional arguments: snp positions, and a threshold.
Option 3 would be to iteratively call compute_ld_statistics with thousands of bed files, which seems inelegant.
Any other ideas?
At the moment I would advocate for 2, and if all agree Alouette can code it up. Do we want this as part of moments.ld?
FOr Alouette's project, we are interested in computing all pairwise stats in tens of thousands of short windows along the genome, but exluding pairs below 1000bp. Option 1 would be to compute all pairwise distance in python and filter pairs after the fact. But this is too slow.
Option 2 would be to write a cython function that does the double loop. This could be included in the count_genotypes_sparse.pyx, and we could modify the compute_pairwise_stats to include a positional constraint. This would require passing two additional arguments: snp positions, and a threshold.
Option 3 would be to iteratively call compute_ld_statistics with thousands of bed files, which seems inelegant.
Any other ideas?
At the moment I would advocate for 2, and if all agree Alouette can code it up. Do we want this as part of moments.ld?