alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

STAR parameters --waspOutputMode and --varVCFfile #772

Closed ronaldosfj closed 5 years ago

ronaldosfj commented 5 years ago

Hi Alex,

Thank you very much for the STAR-WASP implementation!

I am trying to perform an allele-specific expression analysis using RNA-Seq data. Therefore, in order to control allelic biases in my data, I am using the wasp parameter.

Here are my doubts: 1) Is it possible to use the --waspOutputMode without the --varVCFfile? I am asking because every time I run --waspOutputMode parameter STAR shows me an error message and suggests the inclusion of the --varVCFfile parameter.

2) Can I use VCF files from a database (such as dbsnp) in --varVCFfile parameter? In case I always need to use the parameter --varVCFfile together with --waspOutputMode, is it possible to use a vcf from a database?

In fact, I already tried to used dbSNP's VCF, however, STAR did not recognize it. Then, I used the vcf file from the sample I am trying to perform the alignment and it worked fine! If I understand correctly, do I need to perform a previous alignment and variant calling without this parameter, after that perform the alignment again?

Best regards, Ronaldo

alexdobin commented 5 years ago

Hi Ronaldo,

you need to use VCF for the WASP option - to tell STAR where the variants are. The VCF file needs to have the 10th column with genotype recorded as 0/1, 1/0, 1/1 (or | instead of /). Personal VCF files are supposed to have this column. If you are using a general VCF file from a database, you can add the 0/1 genotype for all SNPs.

Cheers Alex

ronaldosfj commented 5 years ago

Hi, Thanks for answering me!

How could I add the genotype field to my general VCF file? Do you have any suggestion?

Cheers, Ronaldo

alexdobin commented 5 years ago

Hi Ronaldo,

you would need to add (or replace) the 10th column in the VCF file with 0/1, for every SNP

Cheers Alex

ronaldosfj commented 5 years ago

Great! Thanks again!

marlmatos commented 1 year ago

Hi @alexdobin, I came across this question and it's very relevant to my current situation. I would like to use STAR with --waspOutputMode to map 400ish samples, but I do not have genotypes for all 400, probably just for ~380. Looking at the personal VCFs of these genotypes samples, it looks like the variant calling was done for all samples at once, and all vcfs have the same variants regardless of their GT state (0/0, 0/1, 1/1). So my question is the following: If I decide to go forward with a generalized VCF where the 10th column is 0/1, and decide to map all 400 samples, would it make a huge difference?

Will I loose reads for the sites that are homozygous in some samples if all variants in the --varVCFfile are 0/1?

Thanks in advance, Marliette

alexdobin commented 1 year ago

Hi @matosmr

It should work fine, as GT=0/0 (reference genotype) should not affect the mapping. But I would recommend removing GT=0/0 from each of the personal VCFs.

AmandaOFurlan commented 3 months ago

Hi @alexdobin ! Thank you so much for the new tool (STAR + WASP). I tried using the tool and read that I need to add genotype information. However, I don't have this information because I only have RNA-Seq data. So, should I first align with STAR and call variants, and then use STAR again with the SNP file containing genotype information along with WASP?