CoreArray / PySeqArray

PySeqArray: data manipulation of whole-genome sequencing variants with SeqArray files in Python (pre-release version)
GNU General Public License v3.0
13 stars 0 forks source link

FilterSet2 intersect not working? #1

Open seeholza opened 6 years ago

seeholza commented 6 years ago

I am trying to filter variants by chromosome and position.

Here's my code:

fn = ps.seqExample('1KG_autosomes_phase3_shapeit2_mvncall_integrated_v5_20130502_lzma.seq.gds')
f = ps.SeqArrayFile()
f.open(fn)

f.FilterReset()

# filter for chromosome
chrs_all = f.GetData('chromosome')
chrs_sel = chrs_all == '22'
f.FilterSet2(variant=chrs_sel)
# returns:  # of selected variants: 1,103,547

pos_on_chrom = f.GetData('position')
# there are 3 variants with positions < 16050319
f.FilterSet2(variant=pos_on_chrom<16050319, intersect=True)
# returns:  # of selected variants: 1,103,547

Given that there 3 variants with position < 16050319 shouldnt the last call to FilterSet2 return 3 variants?

Update Of course some simple boolean slicing does the job, maybe I misunderstood the intention of intersect=True?

sel = np.zeros_like(chrs_all, dtype=bool)
np.putmask(sel, chrs_sel, pos_on_chrom < 16050319)  # there are 3 variants with positions < 16050319
f.FilterSet2(variant=sel)
# returns: # of selected variants: 3
zhengxwen commented 6 years ago

Thanks for using PySeqArray. I will look into this issue.