Illumina / BeadArrayFiles

Python library to parse file formats related to Illumina bead arrays
46 stars 34 forks source link

gtc to plink ped format - 0|G instead of A|G #30

Open 1teaspoon opened 2 years ago

1teaspoon commented 2 years ago

Hi folks,

I am having an issue when using IlluminaBeadArray libaray to convert gtc file to ped file format. A lot of snps in the converted plink file have 0|G instead they should be A|G. Below is the part of my script related to this conversion.

import sys import os from IlluminaBeadArrayFiles import GenotypeCalls, BeadPoolManifest, code2genotype

def outputPlink(gtc_file, manifest_file, sample_name, plink_out_dir, genoThresh = 0.15): manifest = BeadPoolManifest(manifest_file) gtc = GenotypeCalls(gtc_file) GenoScores = gtc.get_genotype_scores() top_strand_genotypes = gtc.get_base_calls() outBase = plink_out_dir + '/' + sample_name allGenotypes = [] with open(outBase + '.ped', 'w') as pedOut, open(outBase +'.map','w') as mapOut: for (name, chrom, map_info, source_strand_genotype, genoScore) in zip(manifest.names, manifest.chroms, manifest.map_infos, top_strand_genotypes, GenoScores): mapOut.write(' '.join([chrom, name, '0', str(map_info)]) + '\n') if source_strand_genotype == '--': geno = ['0', '0'] else: geno = [source_strand_genotype[0], source_strand_genotype[1]] allGenotypes += geno pedOut.write(' '.join([sample_name, sample_name, '0', '0', '0', '-9'] + allGenotypes) + '\n')

jjzieve commented 2 years ago

@1teaspoon Can you post the entire script? Or better, open a branch? And which product are you running? Just looking at the code, I can't really tell what might be going wrong. I'd assume top_strand_genotypes would be populated with with actual string base calls based on get_base_calls (https://github.com/Illumina/BeadArrayFiles/blob/dc4eb370fa97582db3857680cbf3071cde9a6ec5/module/GenotypeCalls.py#L378) so not sure where a "0" came up. But hard to say without reproducing what you're running and stepping through the code.