alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

2-pass Mapping #904

Open nbbarrientos opened 4 years ago

nbbarrientos commented 4 years ago

Hello,

I'm a bit confused about the --twopassMode flag. Does it mean that when you use this option, STAR does both passes at once? Or is this an option that you have to specify only in the second pass?

Also, at the end of the day, if I have hundreds of samples, is multi-sample 2-pass mapping the appropriate way to approach the mapping or the per-sample 2-pass mapping would also work?

Thanks, Nelson

nbbarrientos commented 4 years ago

Additionally, to use the --waspOutputMode, is this an option that must be used in each pass or only the second pass. Further, from some of the other comments, it seems like the --varVCFfile option must be used when using the WASP option. If this is the case, would STAR be able to handle a vcf file that contains information for all the samples in the study (i.e. a vcf file with more than 10 columns)? My current vcf file contains info in the format: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLES. In this case SAMPLES being columns 10-161.

Thanks, Nelson

alexdobin commented 4 years ago

Hi Nelson,

--twopassMode Basic option activates "per-sample" 2-pass mapping - STAR will run 2 passed for each sample independently from others.

In principle, multi-sample 2-pass should be more accurate - however, in practice, with a large number of samples, too many novel junctions are detected, which slows down the 2nd pass and increases the number of multimappers. Unless you want to detect low-expressed novel junctions in multiple samples, I would not recommend multi-sample 2-pass for 100s of samples.

For WASP options, you would need to supply a different --varVCFfile for each individual, with the 10th column being the individual genotype. You should be able to do it with --twopassMode Basic, but I did not test it thoroughly, so please let me know if you see any issues. The WASP options are only applied at the 2nd pass.

Cheers Alex

nbbarrientos commented 4 years ago

Hi Alex, thank you very much for your response. I have a few other questions. I decided to go for the per-sample 2-pass mapping and used the "--twopassMode Basic" to activate it during the first pass. I also included annotations in this step.

Now, the first question I have is regarding the genome generation step. I wanted to clarify if I'm understand the manual correctly, this step does not need to be performed after version 2.4.1a?

My next question is regarding setting up the 2nd pass using per-sample mapping. Based on the manual, it seems like I would need to list the SJ.out.tab files from the 1st pass, correct? Additionally, I am assuming the "--twopassMode Basic" needs to be included again?

Thanks for your help, Nelson

alexdobin commented 4 years ago

Hi Nelson,

sorry for the belayed reply.

The genome indexing step need to be performed for all versions. If you are using --twopassMode Basic, you do not need to worry about the SJ.out.tab, it will be included automatically.

Cheers Alex

nbbarrientos commented 4 years ago

Hi Alex,

Thank you so much for all of your help. I've started to get some output from the per-sample 2-pass mapping with WASP implementation. From the manual, it seems like the reads tagged with vW:i:1 are the ones that passed the filtering. However, what about other reads that are tagged with vW:i:5 or vW:i:2?

Thanks, Nelson

alexdobin commented 4 years ago

Hi Nelson,

the WASP tags vW are described in the manual, Chapter 11 "WASP filtering of allele specific alignments."

Cheers Alex

nbbarrientos commented 4 years ago

Hi Alex,

Thank you so much for all of your help!!

Best, Nelson

nbbarrientos commented 4 years ago

Hi Alex,

I have a quick question. For any downstream analysis, what would be the best way to filter the output bam files to use only the reads that have the vW:i:1 tag? In other words, would I need to use samtools to make a .bam file that contains only the reads with the vW:i:1 tag?

Best, Nelson

alexdobin commented 4 years ago

Hi Nelson,

yes, you would need to go through samtools view and then parse the vW:i:1 tag (e.g. with awk). STAR does not have an option to output just the vW:i:1 alignments.

Cheers Alex

nbbarrientos commented 4 years ago

Thank you Alex, I figured that would be the case but I wanted to double check with you first.

Best, Nelson