frankvogt / vcf2gwas

Python API for comprehensive GWAS analysis using GEMMA
GNU General Public License v3.0
84 stars 29 forks source link

Working with genomes with more than 23 chromosomes #20

Closed tottlefields closed 1 year ago

tottlefields commented 1 year ago

Firstly, great work, thank you!

I work with dog data and as such have 38 chromosomes in my VCF file. Your manual states "vcf2gwas recognizes chromosomes in the following formats (here the first chromosome): Chr1, chr1, 1.", which it does. However, when using chr37 (in this case) I get an error as an extra PLINK flag is required... I have tried with chr37 and 37 as I can see a numerical check happening in the make_bed function.

Converting to PLINK BED..
Error: Invalid chromosome code '37' on line 2267 of .vcf file.
(This is disallowed for humans.  Check if the problem is with your data, or if
you forgot to define a different chromosome set with e.g. --chr-set.)

How can I run this analysis on genomes with more than 23 chromosomes please? Thanks :)

frankvogt commented 1 year ago

Hi,

Is it possible that in your VCF file some chromosomes are missing? The issue might be that vcf2gwas checks the total number of chromosomes and sets the plink flags accordingly while plink may still have a problem if a chromosome is labeled with "37" even though the total number of chromosomes in your VCF file would be less than 37.

tottlefields commented 1 year ago

Ahhh, OK. Yes I was running it on a single chromosome VCF file as that is how we store our VCF files (else they are too big to manage and it’s easier to parallelise across individual car files). Your program ran absolutely fine on a single chr24 so assume dit was a human versus dog issue (we hit these fairly regularly, LOL!). I managed to get it to run labelling the car as 3.7 rather than 37 so I have a work around, but it would be fab if it could work for chromosomes larger than 24 the asme way that it works with less than 24.

Thanks, Ellen.

On 27 Feb 2023, at 11:19, Frank Vogt @.***> wrote:

Hi,

Is it possible that in your VCF file some chromosomes are missing? The issue might be that vcf2gwas checks the total number of chromosomes and sets the plink flags accordingly while plink may still have a problem if a chromosome is labeled with "37" even though the total number of chromosomes in your VCF file would be less than 37.

— Reply to this email directly, view it on GitHub https://github.com/frankvogt/vcf2gwas/issues/20#issuecomment-1446153412, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSC7F6NRKHP4LHE2RHQBCTWZSEUJANCNFSM6AAAAAAU6AJR4Y. You are receiving this because you authored the thread.

frankvogt commented 1 year ago

Alright perfect! I will implement a fix for this case as soon as possible.

frankvogt commented 1 year ago

fixed in vcf2gwas v0.8.9