Closed mufernando closed 5 years ago
Hi Murillo, would you like to join them on the intersection of the sites? That is possible. Also, will they have the same alleles at each site, or might the alleles be different (that complicates things).
On Mon, 15 Jul 2019, 21:05 Murillo Fernando Rodrigues, < notifications@github.com> wrote:
Hi,
I was wondering if you have an effective way of joining two genotype arrays that do not have necessarily the same positions genotyped.
Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cggh/scikit-allel/issues/278?email_source=notifications&email_token=AAFLYQRW4TOUOEUG6ACVWNDP7TKBBA5CNFSM4ID2KLZKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7J3NFA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFLYQT7MPJKCXT35Q6XDMLP7TKBBANCNFSM4ID2KLZA .
Hi Alistair,
Not necessarily the same alleles. And I was thinking more like a full outer join (i.e., keeping all sites but changing the ones not present in a few of the individuals as missing).
Thanks!
Sorry, Alistair, they have the same alleles.
Hi Murillo, thanks, yes you could join two genotype arrays like this. The workflow would look something like this...
# obtain variant positions for first callset somehow
pos1 = allel.SortedIndex(callset1[chrom]['variants']['POS'])
# obtain variant positions for second callset somehow
pos2 = allel.SortedIndex(callset2[chrom]['variants']['POS'])
# find the intersection of the positions
loc1, loc2 = pos1.locate_intersection(pos2)
# obtain genotypes for first callset somehow
gt1 = allel.GenotypeArray(callset1[chrom]['variants']['GT'])
# select intersection sites
gt1_isec = gt1.compress(loc1, axis=0)
# obtain genotypes for second callset somehow
gt2 = allel.GenotypeArray(callset1[chrom]['variants']['GT'])
# select intersection sites
gt2_isec = gt2.compress(loc2, axis=0)
# sanity checks
assert gt1_isec.shape[0] == gt2_isec.shape[0]
# concatenate along samples dimension
gt_concat = gt1_isec.concatenate([gt2_isec], axis=1)
Hth.
Hi,
I was wondering if you have an effective way of joining two genotype arrays that do not have necessarily the same positions genotyped.
Thank you!