VUmcCGP / wisecondor

WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR): Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.
Other
44 stars 65 forks source link

Pickle to NPZ? #33

Closed biocyberman closed 7 years ago

biocyberman commented 7 years ago

I want to try the newer wisecondor that use NPZ. One problem is that we do not have the original BAM files. Is there a way to convert .pickle file to .npz?

rstraver commented 7 years ago

I've had this question before, and I guess it makes sense for some situations. I wrote a little script that should do the conversion for you, just copy and paste the following bit into a .py file and it should do the conversion for any pickle you provide. I didn't quite check the correctness of this as I currently do not have much to test this on.

A word of warning: I can imagine that due to differences in implementation, a consistent error in either conversion step can pop up, making a file converted from bam to pickle to npz behave differently than a npz created from a bam directly. Neither implementation would suffer from this difference on their own, it only shows problematic with a conversion like this. If you find there is such a systematic error, please let me know. To test, just convert a bam file you have using the new implementation and the old, then convert it to npz as well, and see if their results differ when testing for CNVs (or send the npzs to me and I can check them internally).

Also, this conversion fills a lot of stats and runtime information it cannot obtain with -1 (or None). So I advice strongly against using it unless it is for reference creation and you have absolutely no way of retrieving the original bam files.

import sys
import pickle
import argparse
import numpy as np

def getRuntime():
    runtime = dict()
    runtime['version']='None'
    runtime['datetime']='None'
    runtime['hostname']='None'
    runtime['username']='None'
    return runtime

parser = argparse.ArgumentParser(description='Convert a legacy pickle file to the newer npz format',
        formatter_class=argparse.ArgumentDefaultsHelpFormatter)

parser.add_argument('infile', type=str,
                    help='old format pickle file (input)')

parser.add_argument('outfile', type=str,
                    help='new format npz file (output)')

parser.add_argument('-binsize', type=int, default=1000000,
                    help='binsize used for pickle creation')

args = parser.parse_args()

sample = pickle.load(open(args.infile,'rb'))

chromosomes = dict()
for chrom in sample:
    chromosomes[chrom] = np.array(sample[chrom],dtype=np.int32)

qual_info = {'mapped':-1,
    'unmapped':-1,
    'no_coordinate':-1,
    'filter_rmdup':-1,
    'filter_mapq':-1,
    'pre_retro':-1,
    'post_retro':-1,
    'pair_fail':-1}

np.savez_compressed(args.outfile,
    arguments=vars(args),
    runtime=getRuntime(),
    sample=chromosomes,
    quality=qual_info)
biocyberman commented 7 years ago

I will try and give feedback to this.