Open kullrich opened 2 years ago
Might be a starting point, since only biallelic sites are used and converted into a so-called multi-allelic representation (ref-hom (0), het (1), alt-hom (2), I guess I need to create a new ihh01_scan_unphased
function in allel/opt/stats.pyx
, a new ssl2ihh_unphased
function in allel/opt/stats.pyx
and create a new ihs_unphased
function in allel/stats/selection.py
import allel
from allel.util import asarray_ndim
from allel.util import check_integer_dtype
from allel.util import check_dim0_aligned
from allel.compat import memoryview_safe
def genotype_to_multilocus(g, pos):
"""Converts a Genotype array into a Multilocus array,
only retaining biallelic sites and corresponding positions.
Parameters
----------
g : array_like, int, shape (n_variants, n_samples, ploidy)
Genotype array.
pos : array_like, int, shape (n_variants,)
Variant positions (physical distance).
Returns
-------
ms : array_like, int, shape (n_variants, n_loci)
Multilocus array.
mspos : array_like, int, shape (n_variants,)
Biallelic positions (physical distance).
"""
# convert genotype array into multi-locus array
ac = g.count_alleles()
bipos = [x for x,y in enumerate(ac.allelism()) if y in [1,2]]
gs = g.subset(bipos)
# get multi-locus array
ms = gs.is_het().astype(int)
ms[gs.is_hom_alt()] = 2
ms = allel.HaplotypeArray(ms)
mspos = [y for x,y in enumerate(pos) if x in bipos]
# check inputs
ms = asarray_ndim(ms, 2)
check_integer_dtype(ms)
mspos = asarray_ndim(mspos, 1)
check_dim0_aligned(ms, mspos)
ms = memoryview_safe(ms)
mspos = memoryview_safe(mspos)
return [ms, mspos]
g = allel.GenotypeArray([[[0, 0], [0, 0], [0, 0]],
[[0, 0], [0, 1], [0, 1]],
[[0, 0], [1, 1], [0, 1]],
[[0, 1], [1, 1], [1, 1]],
[[1, 1], [1, 1], [-1, -1]],
[[0, 0], [1, 2], [0, 0]],
[[0, 1], [1, 2], [1, 2]],
[[0, 1], [-1, -1], [0, 0]],
[[-1, -1], [-1, -1], [-1, -1]]])
pos = [2, 4, 7, 14, 15, 18, 19, 25, 27]
ms, mspos = genotype_to_multilocus(g, pos)
Dear scikit-allel team,
would it be possible to add an unphased version of the IHS and XPEHH statistic to be directly used with a GenotypeArray?
The developer of selscan has just published an unphased version of both methods:
https://www.biorxiv.org/content/10.1101/2021.10.22.465497v1.full.pdf
Best regards and thank you in anticipation
Kristian