freeseek / gtc2vcf

Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
MIT License
140 stars 24 forks source link

Possible to extract SNP table metrics? #24

Closed oyhel closed 3 years ago

oyhel commented 4 years ago

Thanks you for developing this tool! The one single Windows dependency we have is in running GenomeStudio, and getting rid of this is a huge help.

I am wondering if it would be possible to extract SNP table metrics using this tool. For instance we are often faced with the need to extract eg. logR-ratio and B allele frequencies when using PennCNV (http://penncnv.openbioinformatics.org/en/latest/user-guide/input/) among other minor interactions with GenomeStudio. Would it be possible to extract these starting from IDAT files without ever having to interact with GenomeStudio?

Thanks again for your work!!

freeseek commented 4 years ago

Yes, the tool extract exactly all sort of intensity metrics from GTC files, including BAF and LRR (if both a BPM manifest file and an EGT cluster file are provided, these will be actually recomputed rather than extracted from the GTC files). To then extract these values from the VCF in a table format you can use bcftools query.

You don't need to use GenomeStudio to convert IDAT files to GTC files. You just need a BPM manifest file and an EGT cluster file and then the conversion from IDAT to GTC can be run using the iaap-cli command from Illumina as explained in the main page. The only potential issue would arise if you don't have an EGT cluster file already. In this case you will need to generate one but I don't have any experience in this matter. Illumina provides EGT cluster files for all of its non-custom arrays.

If you have an EGT cluster file for your array but it is not well calibrated for your cohort, you could use the --adjust-clusters option from gtc2vcf to adjust cluster positions on the fly when converting to VCF. It is a useful option if the EGT cluster file is grossly miscalculated. However, I would advise to use this option only if you know what you are doing.

As for PennCNV, the tool I wrote, MoChA, while designed mostly for detecting mosaic chromosomal alterations rather than constitutional deletions and duplications, can be also used as a replacement for PennCNV to some extent.

freeseek commented 4 years ago

I will also add that you can use bcftools +gtc2vcf --output-type t to generate Genomestudio-like tables straight from GTCs. This might more immediately give you what you need, rather than converting to VCF first and then extract a table with bcftools query.