cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
287 stars 49 forks source link

Exporting genotype to a file #348

Open vappiah opened 3 years ago

vappiah commented 3 years ago

Is there a way to save genotype data structure to a file for furthur processing in allel?

I will like to generate manhattan plot and the tool requires data to be in the genotype format. Please find attached a sample of the expected output

geno

hardingnj commented 3 years ago

Hi @vappiah. This looks like you need uncompressed VCF?

Can you describe your workflow in a bit more detail- and hopefully we can help. ie:

PS This issue tracker is for bugs; user queries should be directed to: scikit-allel@googlegroups.com

vappiah commented 3 years ago

Hi @hardingnj . My starting format is a vcf file generated using gatk pipeline. Using allel, I first convert to h5 format, open the h5 and create the genotypechunked array. So what I want is to save the genotypechunkedarray to another file and use that for downstream analysis. But I am yet to find a function in allel that can do the export. Below is the code

callset=h5py.File(h5,mode='r') calldata = callset['calldata'] genotype = allel.GenotypeChunkedArray(callset['calldata/GT'])

hardingnj commented 3 years ago

Hi-

There is no function in scikit-allel to generate .geno files, or other text files. I would suggest using something like: https://vcftools.github.io/man_latest.html to convert between text-based formats.

Alternatively- it would be fairly straightforward to loop though rows and write the columns required by the tool to a file.