freeseek / gtc2vcf

Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
MIT License
131 stars 22 forks source link

Error Encountered while parsing the input #59

Open crbeam opened 8 months ago

crbeam commented 8 months ago

Hello -

Thank you for this tool for converting IDAT files into usable vcf files for analysis in plink. It's terrific!

I recently ran into an issue that I cannot seem to resolve on my own. When I attempt to create the bcf/vcf files from my gtc files, I receive the following error:

faidx_fetch_seqa failed at chrY:58862852 (are you using the correct reference genome?) Error encountered while parsing the input

I am using the same csv, bpm, and egt files that I used previously without error and my fasta reference file is GRCh38. My bcftools version is 1.16, which I believe should be a sufficient version for use with the gtc2vcf plugin. Before I remove and reinstall bcftools, I wanted to check with you to see whether there was a simpler fix for this problem.

Thank you. Chris

freeseek commented 8 months ago

Chromosome Y in GRCh38 is 57,227,415 base pairs long and your manifest file has a coordinate at 58,862,852 which means that the coordinates in your manifest file are not for GRCh38. I don't know what array type and what version of the manifest file you are using, so I cannot guess why this is happening. However, the correct course of action here is to either find whether Illumina has GRCh38 manifest files for you array or simply use gtc2vcf's framework to realign the manifest file and then use the realigned coordinates when performing the conversion to VCF with gtc2vcf's option --sam-flank. You should always use this approach unless you have first verified that your manifest file exclusively contains GRCh38 coordinates

crbeam commented 8 months ago

Dear Giulio,

Thank you. This is very helpful information. Much appreciated.