A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.
I am trying to run this command plink2 --vcf alaudinus.vcf.gz --make-bed --out alaudinus --allow-extra-chr in order to make ped and map files out of my vcf file so that I can run ROH analyses.
However, I keep getting this error:
Error: Line 37274623 of --vcf file has fewer tokens than expected.
While I know this error could result from corruption of my file, when I look inside the file the formatting seems fine. Others have suggested that plink may not be able to handle the notation that GATK uses for overlapping deletions, which is a symbol. Do you know if this is the case or not? I know that there are a lot of lines in my vcf file that contain this notation.
Additionally, I have tried looking at this specific line of my vcf file using this command zcat alaudinus.vcf.gz | grep -v "^#" | awk '{print NF}' | sort | uniq -c
and I get this output: 1 30 37274248 73
which I think means that there are some lines in my file with 30 fields and other lines with 73 fields, however, when I just look at the file, I can count 32 columns.
I am very new to working with vcf files and plink, so any guidance for how to reformat my vcf file (if needed) or how to proceed with plink would be greatly appreciated.
Hello,
I am trying to run this command
plink2 --vcf alaudinus.vcf.gz --make-bed --out alaudinus --allow-extra-chr
in order to make ped and map files out of my vcf file so that I can run ROH analyses.However, I keep getting this error:
Error: Line 37274623 of --vcf file has fewer tokens than expected.
While I know this error could result from corruption of my file, when I look inside the file the formatting seems fine. Others have suggested that plink may not be able to handle the notation that GATK uses for overlapping deletions, which is a symbol. Do you know if this is the case or not? I know that there are a lot of lines in my vcf file that contain this notation.
Additionally, I have tried looking at this specific line of my vcf file using this command
zcat alaudinus.vcf.gz | grep -v "^#" | awk '{print NF}' | sort | uniq -c
and I get this output:
1 30 37274248 73
which I think means that there are some lines in my file with 30 fields and other lines with 73 fields, however, when I just look at the file, I can count 32 columns.I am very new to working with vcf files and plink, so any guidance for how to reformat my vcf file (if needed) or how to proceed with plink would be greatly appreciated.
thank you so much in advance!!