cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
283 stars 49 forks source link

reading data chromosome key error #387

Open snowformatics opened 1 year ago

snowformatics commented 1 year ago

Hi,

I am trying to read my genomic data and to follow the tutorial:

callset = zarr.open('test.zarr', mode='r') variants = callset['chr7B']['variants']

but I am getting this error:

raise KeyError(item) KeyError: 'chr7B'

When I print out a list of all chromosome, chr7B is included:

list(callset['variants/CHROM/']

Any ideas what I am doing wrong?

Thanks

patrick-koenig commented 1 year ago

Hi Stefanie,

it is not possible to access or get variants in such manner: callset['chr7B']['variants']

If you want to get data or metadata of variants for chromosome chr7B as a numpy-array you need to do it like this:

pos_index = allel.ChromPosIndex(callset['variants/CHROM'][:], callset['variants/POS'][:])
chrom_range = pos_index.locate_key('chr7B')
variants_reference_alleles = callset['variants/REF'][chrom_range]
variants_alternate_alleles = callset['variants/ALT'][chrom_range]

Out-of-memory access to portions of the calls could be efficiently done by using get_orthogonal_selection() method of the underlying Zarr library which loads only the needed slice of the zarr-array into a numpy-array:

calls_chr7B = callset['calldata/GT'].get_orthogonal_selection((chrom_range, slice(None), slice(None)))

Hope that solves your problem?

Patrick

snowformatics commented 1 year ago

Thanks a lot Patrick! Looks like the tutorial I found was outdated, I will give it a try 😊