Open hadasbi opened 3 years ago
I think you need exactly 10 columns in the vcf file and the last column should be the genotype information (e.g. 1/0, 1/1)? This is the vcf format when I use for the STAR waspoutputMode
1 52727 . C G . PASS AC=1;AN=2 GT 1/0 1 53206 . G C . PASS AC=1;AN=2 GT 1/0
Thanks, I added these two columns (an entire column of "GT" and an entire column of "1/0") and the error did not appear. Can you please explain how does the WASP algorithm use this information? for example, if I put 1/0, but there's actually more than one possible alternative allele, would it report on both alleles (if found in the sequencing data) or would it ignore the second one?
Hi @hadasbi
the algorithm is designed for personal genomes, so it will only work with biallelic variants. The GT column (10th) should contain only two alleles.
Cheers Alex
Hi,
I'm trying to use the --waspOutputMode option on a dataset. I'm using a COSMIC vcf file, filtered for SNPs only. I added the prefix "chr" to the first column to make it match the chromosome names in my genome (Before I did that, I got the same error).
This is how I run star:
STAR --runMode alignReads --genomeDir ../../.. --readFilesCommand zcat --readFilesIn ../fastq_files/R1.fastq.gz ../fastq_files/R2.fastq.gz --outFileNamePrefix init --outSAMtype BAM Unsorted SortedByCoordinate --outSAMattributes NH HI AS nM NM MD jM jI XS MC ch vA vG vW --varVCFfile CosmicCodingMuts_SNPs.vcf --waspOutputMode SAMtag
I get the error:
EXITING because of FATAL INPUT FILE ERROR: could not find any SNPs in VCF file: CosmicCodingMuts_SNPs.vcf SOLUTION: check formatting of the VCF file; unzip VCF file or use process substitution.
This is how the vcf looks like: chr1 65797 COSV58737189 T C . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN23957695;SNP;CDS=c.9+224T>C;AA=p.?;HGVSC=ENST00000641515.2:c.9+224T>C;HGVSG=1:g.65797T>C;CNT=1 chr1 66041 COSV58737025 A G . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN24406961;CDS=c.9+468A>G;AA=p.?;HGVSC=ENST00000641515.2:c.9+468A>G;HGVSG=1:g.66041A>G;CNT=1 chr1 66131 COSV58737120 C G . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN17493387;CDS=c.9+558C>G;AA=p.?;HGVSC=ENST00000641515.2:c.9+558C>G;HGVSG=1:g.66131C>G;CNT=1 chr1 66312 COSV58736797 A T . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN7277204;CDS=c.9+739A>T;AA=p.?;HGVSC=ENST00000641515.2:c.9+739A>T;HGVSG=1:g.66312A>T;CNT=1 chr1 66371 COSV58737143 A T . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN17943257;CDS=c.9+798A>T;AA=p.?;HGVSC=ENST00000641515.2:c.9+798A>T;HGVSG=1:g.66371A>T;CNT=1 chr1 66373 COSV58736978 A T . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN4902613;CDS=c.9+800A>T;AA=p.?;HGVSC=ENST00000641515.2:c.9+800A>T;HGVSG=1:g.66373A>T;CNT=2 chr1 66605 COSV58736995 T A . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN28548666;CDS=c.9+1032T>A;AA=p.?;HGVSC=ENST00000641515.2:c.9+1032T>A;HGVSG=1:g.66605T>A;CNT=1 chr1 66795 COSV58737166 T A . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN28504759;CDS=c.9+1222T>A;AA=p.?;HGVSC=ENST00000641515.2:c.9+1222T>A;HGVSG=1:g.66795T>A;CNT=1 chr1 66814 COSV58737064 A G . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN22517283;SNP;CDS=c.9+1241A>G;AA=p.?;HGVSC=ENST00000641515.2:c.9+1241A>G;HGVSG=1:g.66814A>G;CNT=2 chr1 66864 COSV58736928 C T . . GENE=OR4F5_ENST00000641515;STRAND=+;LEGACY_ID=COSN21650154;CDS=c.9+1291C>T;AA=p.?;HGVSC=ENST00000641515.2:c.9+1291C>T;HGVSG=1:g.66864C>T;CNT=1
Any idea what's wrong? Thanks, Hadas