Open Roleren opened 3 years ago
Hi Håkon
you would need to increase the --outFilterMultimapNmax
to a large value, to make it mappable.
Also, if you expect more than 50 loci per read, you would also need to increase --winAnchorMultimapNmax
to the expected number of loci - however, it will slow down the mapping if it's too big, so I would not go over 200-500.
If you do not need the contaminant alignments, you can switch off the SAM output with --outSAMtype None
.
If you need to keep just one (or a few) contaminant alignments, you can use --outSAMmultNmax 1
.
Cheers Alex
Hm, so STAR has no equivalent to -k ? Setting --outFilterMultimapNmax would only make it find even more multimappers, I need it to stop after first hit, and directly ignore that read. Since I do not output aligned reads.
Hi Håkon
no, there is no such option for STAR, owing to a difference in the multimapper treatment. For Bowtie and other DNA aligners, multimappers are a nuisance as they cannot be used for variant calling. As soon as they find that a read maps to 2 loci, it can be discarded. For RNA-seq, multimappers represent the expression of paralogous loci, so they are useful, and STAR tries to find all loci that a read maps to.
Also, I am not exactly sure how the -k option in Bowtie2 works. Once you find 2 loci with equal mapping quality, how do you know that a 3rd locus will not give a better alignment? I guess there is some heuristics there enforcing that the first loci are of sufficient quality. Considering all loci is safer, but of course, it carries some computational overhead.
Cheers Alex
Ok, I will do some tests and let you know how much mapping is slowed down, when I increase the allowed multimappers, then I will close the issue.
Hi, thanks for all the great work you are doing.
A quick question I could not find the answer to in the manual or google:
Does there exist a similar way in STAR to achive what bowtie does with the -k argument ?
Explained here:
I am filtering out a large list of contaminants in a single fastq file: Silva Ribosomal fastq (2GB), phix, tRNAs, ncRNAs, and I want to keep only all reads that does not align (unmapped file with contaminant reads removed). Problem is now that when it multimaps (usually variants of the rRNA 5S, there are hundreds of them in Silva), it is kept (in the unmapped file), since if it multimaps too much it goes into the "unmapped" file, and I do not want to multimap at all, I only want to see if it hits once, and if it does it should stop checking for multimappers.
So is this possible to achive in STAR ?