DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

How critical is to specify strandness? #288

Open Elle-hu opened 3 years ago

Elle-hu commented 3 years ago

Dear all,

I have strand-specific paired-end sequencing data. Being unsure about the orientation of the mates I run Hisat2 without specifying strandness and the orientation of the reads.

My command looked like this hisat2 -p 16 --dta -x /hisat2/my_index/mm10.genome_tran -1 /fastq/X_R1.fastq.gz -2 /fastq/X_R2.fastq.gz -S hisat2/map/X.sam

I´ve got pretty good alignment rates, i.e:

X 37172698 reads; of these: 37172698 (100.00%) were paired; of these: 3001620 (8.07%) aligned concordantly 0 times 28036019 (75.42%) aligned concordantly exactly 1 time 6135059 (16.50%) aligned concordantly >1 times

3001620 pairs aligned concordantly 0 times; of these:
  122139 (4.07%) aligned discordantly 1 time
----
2879481 pairs aligned 0 times concordantly or discordantly; of these:
  5758962 mates make up the pairs; of these:
    4135571 (71.81%) aligned 0 times
    1208512 (20.98%) aligned exactly 1 time
    414879 (7.20%) aligned >1 times

94.44% overall alignment rate

And I went on to build an assembly with Stringtie

stringtie /hisat2/map/X.bam -l X -p 16 -G mm10.gtf -o /hisat2/assembly/X.gtf

It all went on smoothly and by inspecting the bam files in IGV I didn´t notice any inconsistency, only a few transcripts ended up not having a strand assigned and reported "." in the place of the strand, but I read this is a common issue that doesn´t seem to be related to the strandness option, am I correct? I have now the doubt that not specify the strandness could have introduced some mistake or have led to loss of information. Could you please clarify to me how Hisat2 behaves with stranded data when this information is not given? Do you advise to repeat the procedure giving the correct strand information or this is not critical?

Thanks!

Best,

Lily