freeseek / gtc2vcf

Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
MIT License
131 stars 22 forks source link

No output VCF files #36

Closed Ballote closed 3 years ago

Ballote commented 3 years ago

Hello

I see it's necessary two steps to convert from .CEL to .VCF. In the first step is generated xxxxx.AxiomGT1.chp files (where xxxxx is the name of the original file) is this correct?

Now, I'm having problem with the second step. When I run that part of the program I have no errors but also I can't find the VCF files. This is the code I'm running:

bcftools +affy2vcf \ --no-version -Ou \ --csv "GenomeWideSNP_6.na35.annot.csv" \ --fasta-ref "human_g1k_v37.fasta" \ --chps /home/adrianib/Proyecto/cc-chp \ --snp /home/adrianib/Proyecto/AxiomGT1.snp-posteriors.txt \ --extra result.tsv | \ bcftools sort -Ou -T ./bcftools-sort.XXXXXX | \ bcftools norm --no-version -Ob -o result.bcf -c x -f "human_g1k_v37.fasta" && \ bcftools index -f result.bcf

I see there is no command to indicate the output folder as in the first step. This could be the reason I don't have output VCF files?

In summary, I have this: Original file: xxxxx.CEL 1st step (CEL to CHP): xxxxx.AxiomGT1.chp 2nd step (CHP to VCF): ?

And my question is: Should I have a xxxxx.VCF file at the end of the second step?

Thanks for your help Adrian

freeseek commented 3 years ago

Yes, the idea is that you first use the Affymetrix Power Tools to convert the CEL files to CHP files, and then +affy2vcf to convert the CHP files to VCF files. The code you showed seems right (make sure the \ escapes are used if you have the command over multiple rows). It is difficult for me to advise as to why you did not get a VCF file in the end without you sharing the command output logs.

Ballote commented 3 years ago

Hello

This are the command output logs. It said that is writing a VCF file, but I don't have it in the folder where the program runs or in the cc-chp folder.

Writing to ./bcftools-sort.XXXXXX3rAYcu affy2vcf 2021-05-14 https://github.com/freeseek/gtc2vcf Reading CSV file GenomeWideSNP_6.na35.annot.csv Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD11386a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD11751a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD13766a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD13771a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD14450a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD18748a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD18769a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD4836a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD4972a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD5934a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD5948a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD6409a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD7214a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD7240a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD8828a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD9577a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD9582a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD9589a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD9754a.AxiomGT1.chp Reading AGCC file /home/adrianib/Proyecto/cc-chp/PD9845a.AxiomGT1.chp Reading SNP posteriors file /home/adrianib/Proyecto/AxiomGT1.snp-posteriors.txt Writing VCF file Lines total/missing-reference/missing-snp-posteriors/skipped: 909622/0/246/2279 Merging 2 temporary files Cleaning Done Lines total/split/realigned/skipped: 907343/0/0/0

freeseek commented 3 years ago

It looks fine. Are you sure you don't have a result.bcf file in the end?

Ballote commented 3 years ago

Yes, I have 3 output files: result.bcf, result.bcf.csi and result.tsv.

But shouldn't it have a .VCF file for every .CHP file? or are the VCF files inside result.BCF? I do not understand

freeseek commented 3 years ago

The bcftools sort command generates one VCF file (in this case a binary VCF file, since you used the -Ob option). The +affy2vcf tool converts a set of CHP files into one single VCF. If you wanted one VCF file per CHP file, you would have had to run the command once for each conversion and input one CHP file at a time.

Ballote commented 3 years ago

Ok, Thank you so much!

freeseek commented 3 years ago

You are welcome. :-)