Open yasirs opened 5 years ago
I have a similar problem, but with unpaired data in my case. The read is 31 bp long, the first base is a mismatch, the 30 following bases align perfectly to chromosome 18. HISAT2 cannot align this read, no matter how low I set the mismatch/soft-clip penalties and the minimum required score. If I manually delete the first base of this read, HISAT2 can align it. This happens when using both the grc38 genome_snp index from the website and a custom-built index. This is the read in question:
@rdname
CCAAAGCTCAAATCTTTTTAGACATCAGAGA
+
AAAEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Update: After some more digging, it appears that HISAT2 has problems aligning reads to the reverse reference strand. I have counted, for all reads that HISAT2 was able to align, whether they were clipped from the 5'-end, 3'-end, or both, separately for reads aligning to the forward and reverse reference strand. This is what I get:
aligning to the forward strand: Dict("5prime"=>1993288,"unclipped"=>1961463,"3prime"=>102811,"both"=>103559)
aligning to the reverse strand: Dict("5prime"=>81070,"unclipped"=>1916073,"3prime"=>78094,"both"=>465)
I would consider this a critical issue in HISAT2.
The analysis was done with the following Julia code:
open("12_pk9.sam", "r") do in
clipped_fw = Dict("5prime" => 0, "3prime" => 0, "both" => 0, "unclipped" => 0)
clipped_rev = Dict("5prime" => 0, "3prime" => 0, "both" => 0, "unclipped" => 0)
line = "@"
while startswith(line, "@") && !eof(in)
line = readline(in)
end
while true
fields = split(line, '\t')
fstrand = findfirst(x -> startswith(x, "XS:A:"), fields[12:end]) + 11
fw = rev = false
if occursin(r"^\d+S", fields[6])
fw = true
end
if occursin(r"\d+S$", fields[6])
rev = true
end
if fields[fstrand][end] == '+'
d = clipped_fw
else
d = clipped_rev
fw, rev = rev, fw
end
if fw && !rev
d["5prime"] += 1
elseif !fw && rev
d["3prime"] += 1
elseif fw && rev
d["both"] += 1
else
d["unclipped"] += 1
end
if eof(in)
break
end
line = readline(in)
end
println(clipped_fw)
println(clipped_rev)
end
I'll add another example. I downloaded GENCODE's transcript file for the current human release, "gencode.v32.pc_transcripts.fa", and aligned to the current release genome, "GRCh38.primary_assembly.genome.fa", indexed by:
./hisat2-2.1.0/hisat2-build GRCh38.primary_assembly.genome.fa GRCh38
... using the HISAT2 linux binary. When I align all transcripts:
./hisat2-2.1.0/hisat2 -x GRCh38 -f -U gencode.v32.pc_transcripts.fa -S pc-tx.sam
... the resulting SAM file contains an unaligned transcript:
ENST00000618181.4|ENSG00000187634.12|OTTHUMG00000040719.11|-|SAMD11-213|SAMD11|2179|UTR5:1-80|CDS:81-1751|UTR3:1752-2179| 4 * 0 0 * * 0 0 GCAGATCCCTGCGGCGTTCGCGAGGGTGGGACGGGAAGCGGGCTGGGAAGTCGGGCCGAGGGAAAAGTCTGAAGACGCTTATGTCCAAGGGGATCCTGCAGGTGCATCCTCCGATCTGCGACTGCCCGGGCTGCCGAATATCCTCCCCGGTGAACCGGGGGCGGCTGGCAGACAAGAGGACAGTCGCCCTGCCTGCCGCCCGGAACCTGAAGAAGGAGCGAACTCCCAGCTTCTCTGCCAGCGATGGTGACAGCGACGGGAGTGGCCCCACCTGTGGGCGGCGGCCAGGCTTGAAGCAGGAGGATGGTCCGCACATCCGTATCATGAAGAGAAGCCAGGACGGCAACCTTCCCACCCTCATATCCAGCGTCCACCGCAGCCGCCACCTCGTTATGCCCGAGCATCAGAGCCGCTGTGAATTCCAGAGAGGCAGCCTGGAGATTGGCCTGCGACCCGCCGGTGACCTGTTGGGCAAGAGGCTGGGCCGCTCCCCCCGTATCAGCAGCGACTGCTTTTCAGAGAAGAGGGCACGAAGCGAATCGCCTCAAGCAGAGGCGCTGCTGCTGCCGCGGGAGCTGGGGCCCAGCATGGCCCCGGAGGACCATTACCGCCGGCTTGTGTCAGCACTGAGCGAGGCCAGCACCTTTGAGGACCCTCAGCGCCTCTACCACCTGGGCCTCCCCAGCCACGGCTACGGCTTCCTGCCCCCCGCGCAGGCGGAGATGTTCGCCTGGCAGCAGGAGCTCCTGCGGAAGCAGAACCTGGCCCGGCTGGAGCTGCCCGCCGACCTCCTGCGGCAGAAGGAGCTGGAGAGCGCGCGCCCACAGCTGCTGGCGCCCGAGACCGCCCTGCGCCCCAACGACGGCGCCGAGGAGCTGCAGCGGCGCGGGGCCCTGCTGGTGCTGAACCACGGCGCGGCGCCACTGCTGGCCCTGCCCCCCCAGGGGCCCCCGGGCTCCGGACCCCCCACCCCGTCCCGGGACTCTGCCCGGCGAGCCCCCCGGAAGGGGGGTCCCGGCCCTGCCTCAGCGCGGCCCAGCGAGTCCAAGGAGATGACGGGGGCTAGGCTCTGGGCACAAGATGGCTCGGAAGACGAGCCCCCCAAAGACTCGGACGGAGAGGACCCCGAGACGGCAGCTGTTGGGTGCAGGGGGCCCACTCCGGGCCAAGCTCCAGCTGGAGGGGCCGGCGCCGAGGGGAAGGGGCTTTTCCCAGGGTCCACACTGCCCCTGGGCTTCCCTTATGCCGTCAGCCCCTACTTCCACACAGGCGCGGTAGGGGGACTCTCCATGGATGGGGAGGAGGCCCCAGCCCCTGAGGACGTCACCAAGTGGACCGTGGATGACGTCTGCAGCTTCGTGGGGGGCCTGTCTGGCTGTGGAGAGTACACTCGGGTCTTCAGGGAGCAGGGGATCGACGGGGAGACCCTGCCACTGCTGACGGAGGAGCACCTGCTGACCAACATGGGGCTGAAGCTGGGGCCCGCCCTCAAGATCCGGGCCCAGGTGGCCAGGCGCCTGGGCCGAGTTTTCTACGTGGCCAGCTTCCCCGTGGCTCTGCCACTGCAGCCACCAACCCTGCGGGCCCCGGAGCGAGAACTCGGCACAGGAGAGCAGCCCTTGTCCCCCACGACGGCCACGTCCCCCTATGGAGGGGGCCACGCCCTTGCCGGTCAAACTTCACCCAAGCAGGAGAATGGGACCTTGGCTCTACTTCCAGGGGCCCCCGACCCTTCCCAGCCTCTGTGTTGAGGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAAACAAAAAATTTTAAAAGAAAATGTGACTTCAAAGGAAAGGAACAAATTTTCAAAGACTTGGGGGAGTGAAGGCAGAGCCTGGTGCAGATGGACGAGGTCTGCAGACGGAGGGCAGAGGTGGTGGAAGGGGCCAGGGGCCTGCAGGCCTCCCCCTGGAACTGGGACTGGTCTCGGTCTGCTGACGTCAGGGTCAGCTCCCCCGCGGAGCTGACTTCAGCAGCCCACAGCTGTGGGGCTTCAGCAGCCACACCAGCCCAGCCCAGCCCAGCTCTCGATACGTTTGGTCTTTCATGCTGAAAAATAAATAATAAAGCCTGTCCCGTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
But, if I pull that transcript out into a fasta file by itself, then align it with the same default settings, I get:
ENST00000618181.4|ENSG00000187634.12|OTTHUMG00000040719.11|-|SAMD11-213|SAMD11|2179|UTR5:1-80|CDS:81-1751|UTR3:1752-2179| 0 chr1 925741 60 60M121N92M4141N182M5435N125M3143N90M142N141M2997N79M70N500M194N125M320N111M99N674M * 0 0 GCAGATCCCTGCGGCGTTCGCGAGGGTGGGACGGGAAGCGGGCTGGGAAGTCGGGCCGAGGGAAAAGTCTGAAGACGCTTATGTCCAAGGGGATCCTGCAGGTGCATCCTCCGATCTGCGACTGCCCGGGCTGCCGAATATCCTCCCCGGTGAACCGGGGGCGGCTGGCAGACAAGAGGACAGTCGCCCTGCCTGCCGCCCGGAACCTGAAGAAGGAGCGAACTCCCAGCTTCTCTGCCAGCGATGGTGACAGCGACGGGAGTGGCCCCACCTGTGGGCGGCGGCCAGGCTTGAAGCAGGAGGATGGTCCGCACATCCGTATCATGAAGAGAAGCCAGGACGGCAACCTTCCCACCCTCATATCCAGCGTCCACCGCAGCCGCCACCTCGTTATGCCCGAGCATCAGAGCCGCTGTGAATTCCAGAGAGGCAGCCTGGAGATTGGCCTGCGACCCGCCGGTGACCTGTTGGGCAAGAGGCTGGGCCGCTCCCCCCGTATCAGCAGCGACTGCTTTTCAGAGAAGAGGGCACGAAGCGAATCGCCTCAAGCAGAGGCGCTGCTGCTGCCGCGGGAGCTGGGGCCCAGCATGGCCCCGGAGGACCATTACCGCCGGCTTGTGTCAGCACTGAGCGAGGCCAGCACCTTTGAGGACCCTCAGCGCCTCTACCACCTGGGCCTCCCCAGCCACGGCTACGGCTTCCTGCCCCCCGCGCAGGCGGAGATGTTCGCCTGGCAGCAGGAGCTCCTGCGGAAGCAGAACCTGGCCCGGCTGGAGCTGCCCGCCGACCTCCTGCGGCAGAAGGAGCTGGAGAGCGCGCGCCCACAGCTGCTGGCGCCCGAGACCGCCCTGCGCCCCAACGACGGCGCCGAGGAGCTGCAGCGGCGCGGGGCCCTGCTGGTGCTGAACCACGGCGCGGCGCCACTGCTGGCCCTGCCCCCCCAGGGGCCCCCGGGCTCCGGACCCCCCACCCCGTCCCGGGACTCTGCCCGGCGAGCCCCCCGGAAGGGGGGTCCCGGCCCTGCCTCAGCGCGGCCCAGCGAGTCCAAGGAGATGACGGGGGCTAGGCTCTGGGCACAAGATGGCTCGGAAGACGAGCCCCCCAAAGACTCGGACGGAGAGGACCCCGAGACGGCAGCTGTTGGGTGCAGGGGGCCCACTCCGGGCCAAGCTCCAGCTGGAGGGGCCGGCGCCGAGGGGAAGGGGCTTTTCCCAGGGTCCACACTGCCCCTGGGCTTCCCTTATGCCGTCAGCCCCTACTTCCACACAGGCGCGGTAGGGGGACTCTCCATGGATGGGGAGGAGGCCCCAGCCCCTGAGGACGTCACCAAGTGGACCGTGGATGACGTCTGCAGCTTCGTGGGGGGCCTGTCTGGCTGTGGAGAGTACACTCGGGTCTTCAGGGAGCAGGGGATCGACGGGGAGACCCTGCCACTGCTGACGGAGGAGCACCTGCTGACCAACATGGGGCTGAAGCTGGGGCCCGCCCTCAAGATCCGGGCCCAGGTGGCCAGGCGCCTGGGCCGAGTTTTCTACGTGGCCAGCTTCCCCGTGGCTCTGCCACTGCAGCCACCAACCCTGCGGGCCCCGGAGCGAGAACTCGGCACAGGAGAGCAGCCCTTGTCCCCCACGACGGCCACGTCCCCCTATGGAGGGGGCCACGCCCTTGCCGGTCAAACTTCACCCAAGCAGGAGAATGGGACCTTGGCTCTACTTCCAGGGGCCCCCGACCCTTCCCAGCCTCTGTGTTGAGGTTGCCGGGGGTAGGGGTGGGGCCACACAAATCTCCAGGAGCCACCACTCAACACAATGGCCCTGCCTCCCACCGCTTTATTTCTTTCGGTTTCGGATGCAAAACAAAAAATTTTAAAAGAAAATGTGACTTCAAAGGAAAGGAACAAATTTTCAAAGACTTGGGGGAGTGAAGGCAGAGCCTGGTGCAGATGGACGAGGTCTGCAGACGGAGGGCAGAGGTGGTGGAAGGGGCCAGGGGCCTGCAGGCCTCCCCCTGGAACTGGGACTGGTCTCGGTCTGCTGACGTCAGGGTCAGCTCCCCCGCGGAGCTGACTTCAGCAGCCCACAGCTGTGGGGCTTCAGCAGCCACACCAGCCCAGCCCAGCCCAGCTCTCGATACGTTTGGTCTTTCATGCTGAAAAATAAATAATAAAGCCTGTCCCGTG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:2179 YT:Z:UU XS:A:+ NH:i:1
I hope that was clear. Please let me know if I can provide more info. This was just the first unaligned transcript I came across ... there seem to be many more. Am I missing something?? EDIT:
$ samtools flagstat pc-tx.bam
103428 + 0 in total (QC-passed reads + QC-failed reads)
3137 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
82741 + 0 mapped (80.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I mean, I get that HISAT2 was written for Illumina sequencing, not full transcripts. But then why align 80% of human transcripts to the human genome, and not the other 20%? Too much splicing?
I am aligning raw reads from an Illumina run. I looked at the resulting bam file, filtering for the unaligned reads as
One of the lines is
The sequence (
CAGGGGCTGCAGAACAAATCAAGCACATCCTTGCTAATTTCAAAAACTACCAGTTCTTTATTGGTGAAAACATGAATCCAGATGGCATGGTTGCTCTATTG
) actually aligns to NM_001286272.1 on NCBI blast with 100% identity and coverage. So why is it reported as unaligned?This was the command line for hisat2
I am using the pre-build GRCh38 genome_tran index from the HISAT2 website (ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_tran.tar.gz).
I don't know if am misinterpreting the result in any way. Any suggestions will be appreciated.