VUmcCGP / wisecondor

WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR): Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.
Other
44 stars 65 forks source link

Exporting GC normalized to text ouput #7

Closed guillermomarco closed 10 years ago

guillermomarco commented 10 years ago

Hello Roy,

I'm interested in the study of the result of the sample GC normalization. I would like to print the results of this step into a a plain text file so I can then try to load it into R to make some plots. At this moment, GC normalization step provides a pickle object that I cannot read properly.

Since you know the wisecondor objects, do you have any idea on what's the easiest way I could implement that?

Thanks !

rstraver commented 10 years ago

Try putting this in a new python script file; it takes 2 arguments, the pickle (input) and the target output (csv), I believe that should work for R:

import csv
import pickle
import argparse

parser = argparse.ArgumentParser(description='Translate a pickle to a csv file',
    formatter_class=argparse.ArgumentDefaultsHelpFormatter)

parser.add_argument('pickle', type=str,
                    help='input: pickle file')
parser.add_argument('csv', type=str,
                    help='output: csv to dump table into')
args = parser.parse_args()

sample = pickle.load(open(args.pickle,'rb'))
outWriter = csv.writer(open(args.csv, 'wb'))

keys=[str(x) for x in range(1,23)]
keys.extend(['X','Y'])
for key in keys:
    row = [key]
    row.extend(sample[key])
    outWriter.writerow(row)
guillermomarco commented 10 years ago

Nice ! I'm gonna try it out asap.

I have one more question, debugging gcc.py I've seen you store in corrected object all the dictionaries (one per chromosome) with all the gc corrected values. Do I have to assume that the list values are already coordinate sorted? Since you don't store the coordinate corresponding to the bin. With this I mean if first normalized value corresponds to first window from 0 to binsize, and so on..

Thanks :)

rstraver commented 10 years ago

That is correct, the object is indeed a dictionary containing a coordinate sorted array per chromosome.

guillermomarco commented 10 years ago

Thank you Roy.

guillermomarco commented 10 years ago

Yesterday I used the code you provided me and works like a charm. I've been working with values in R. Now I need to reconvert that csv file to a pickle object so wisecondor would recognize it. CSV and pickle modules should do this?

rstraver commented 10 years ago

Yeah, if you fiddle with the code a bit you should be able to use csv.reader(...) to load your data and then create a dict, fill it with your data (bin array per chromosome, chromosome is the dict key, i.e. myDict['3']=[0,0,0.9923,...]) and pickle.dump(...) this array to a file. WISECONDOR should be able to load it afterward.

I assume you would do this anyway but just to be sure: You'll have to make sure your data is still normalized and you have to re-create the reference set if you altered the file preparations.

Good luck!