DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

Virtually 0 reads aligned (while on same dataset STAR aligns successfully) #272

Closed scogi closed 3 years ago

scogi commented 3 years ago

Hi! I am aligning a cDNA PCR product, which spans two exons to the respective genomic region. The reference is rather small (2.6 kb) and contains the two exons and approx. 2 kb intron in between. The data was generated on a MiSeq with 2*250 bp chemistry. OS is Ubuntu 18.04.

However, if I align the data with HISAT2 virtually no reads are aligned at all. In contrast, alignment with STAR works and also shows the assumed patterns (see below the log of STAR):

$/opt/hisat2/hisat2 --version
/opt/hisat2/hisat2-align-s version 2.1.0
64-bit
Built on login-node03
Wed Jun  7 15:53:42 EDT 2017
Compiler: gcc version 4.8.2 (GCC) 
Options: -O3 -m64 -msse2 -funroll-loops -g3 -DPOPCNT_CAPABILITY
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
$/opt/hisat2/hisat2  \
-x ref/ref.fasta \
-1 fastq/mut/010_m_R1.fastq.gz \
-2 fastq/mut/010_m_R2.fastq.gz \
-q --reorder --phred33 \ 
--novel-splicesite-outfile aligned/010/hisat2/010.m.hisat2.junctions \ 
--summary-file aligned/010/hisat2/010.m.hisat2.summary \ 
-S aligned/010/hisat2/010.m.hisat2.sam \ 
-p 20 -t 

gives:

 Time loading forward index: 00:00:00
Time loading reference: 00:00:00
Multiseed full-index search: 00:00:00
3835 reads; of these:
  3835 (100.00%) were paired; of these:
    3830 (99.87%) aligned concordantly 0 times
    5 (0.13%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    3830 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    3830 pairs aligned 0 times concordantly or discordantly; of these:
      7660 mates make up the pairs; of these:
        7657 (99.96%) aligned 0 times
        3 (0.04%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.17% overall alignment rate
Time searching: 00:00:00
Overall time: 00:00:00

For comparison, the report of STAR looks like this, most of the reads are succcessfully mapped:

 Started job on |   Nov 12 16:00:04
                             Started mapping on |   Nov 12 16:00:09
                                    Finished on |   Nov 12 16:00:13
       Mapping speed, Million of reads per hour |   3.45

                          Number of input reads |   3835
                      Average input read length |   501
                                    UNIQUE READS:
                   Uniquely mapped reads number |   3442
                        Uniquely mapped reads % |   89.75%
                          Average mapped length |   500.09
                       Number of splices: Total |   6853
            Number of splices: Annotated (sjdb) |   6835
                       Number of splices: GT/AG |   6846
                       Number of splices: GC/AG |   0
                       Number of splices: AT/AC |   0
               Number of splices: Non-canonical |   7
                      Mismatch rate per base, % |   0.30%
                         Deletion rate per base |   0.00%
                        Deletion average length |   2.20
                        Insertion rate per base |   0.00%
                       Insertion average length |   2.00
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   3
             % of reads mapped to multiple loci |   0.08%
        Number of reads mapped to too many loci |   0
             % of reads mapped to too many loci |   0.00%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |   0
       % of reads unmapped: too many mismatches |   0.00%
            Number of reads unmapped: too short |   9
                 % of reads unmapped: too short |   0.23%
                Number of reads unmapped: other |   381
                     % of reads unmapped: other |   9.93%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

Any ideas where the issue could be? I am new to HISAT2, so please bear with me, if there is some obvious error in the command line. Thank you,

best whishes Stefan

scogi commented 3 years ago

Hi! I could solve the issue, meanwhile (was caused by an error in the fasta file used as reference).

BW Stefan