frankvogt / vcf2gwas

Python API for comprehensive GWAS analysis using GEMMA
GNU General Public License v3.0
84 stars 29 forks source link

Analysis fails with chomosome number > 5 #5

Closed matteofigliuzzi closed 3 years ago

matteofigliuzzi commented 3 years ago

The package works fine if I submit a vcf with records on chomosome number between chr1 and chr5. When submitting a vcf with records on chromosome number greater than 5 (eg: chr6) I get the following error:


processing /data/research/projects/CGT_PGS/preprocessing/2_merge_vcfs/merged_vcfs/merged_samples_chr6.vcf.gz Lines total/split/realigned/skipped: 29221/0/0/0 Start time: Wed, 21 Jul 2021 20:30:29

Parsing arguments.. Genotype file: tmp.norm.nodup.vcf.gz Phenotype file(s): table_female_pheno.csv Covariate file: table_female_pheno.csv Arguments parsed successfully

Preparing files

Checking table_female_pheno.csv..

Filtering SNPs.. Indexing VCF file.. VCF file successfully indexed (Duration: 5.3 seconds) SNPs successfully filtered (Duration: 38.4 seconds)

File preparations completed

Starting analysis.. Error: Invalid chromosome code 'chr6' on line 529 of .vcf file. (This is disallowed by your --chr-set/--autosome-num parameters. Check if the problem is with your data, or your command line.)

frankvogt commented 3 years ago

How many chromosomes does that VCF file contain? Only chr6 or chr1 - chr6?

matteofigliuzzi commented 3 years ago

It contains only one chromosome. Looking at the source code, I guess the issue comes from the set_chrom() function which counts the number of distinct chromosomes found in the vcf file, the number is then passed as --chr-set argument to plink

frankvogt commented 3 years ago

Yes, so it seems. I uploaded a new version where I changed that part, so if you could update via "conda update vcf2gwas -c conda-forge -c bioconda -c fvogt257" and try again that would be great.

matteofigliuzzi commented 3 years ago

I still get errors when submitting a vcf with a single chromosome, with chromosome number greater than 5 (eg: chr21). If I the chromosome number in the vcf is smaller than 6 (eg: chr5) or if the vcf file containts all the chromosomes the program runs correctly.

frankvogt commented 3 years ago

Be sure to update the program! When I run the latest version with a VCF file containing SNP information of just one chromosome (called "chr6", "chr21", etc.), it runs just fine. If the problem persists after updating, maybe send me the VCF file with which it is not working.

matteofigliuzzi commented 3 years ago

With version 0.6.4 the single chromosome analysis runs correctly, the issue is solved, thanks!