DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
477 stars 119 forks source link

AS, ZS and NH tags: semantics? #49

Open rschulzUK opened 8 years ago

rschulzUK commented 8 years ago

Dear Daehwan,

I stumbled across two cases (see below) where I am unclear about whether I either incorrectly interpret the tags or they are indeed inconsistent. Hisat2 version and invocation were:

@PG ID:hisat2   PN:hisat2   VN:2.0.3-beta   CL:"/home/rschulz/bin/hisat2-2.0.3-beta/hisat2-align-s --wrapper basic-0 -p 4 --rna-strandness RF -x /home/rschulz/research/data/genomes/GRCm38/hisat2/genome_tran -S ./output/E.sam -1 /tmp/31877.inpipe1 -2 /tmp/31877.inpipe2"

Based on the definition from the Hisat2 web site, ZS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. [...], I understand ZS to refer to the best-scoring alignment among the other found alignments, which could be greater than AS. However, I am confused by the additional sentence Note that, when the read is part of a concordantly-aligned pair, this score could be greater than [AS:i].. Why can ZS only be greater than AS when the read is part of a concordantly-aligned pair?

NH is defined as The number of mapped locations for the read or the pair. Does the use of locations instead of alignments imply that distinct alignments spanning the same coordinates in the target genome are not counted here? That could explain case 1 below, but not case 2.

Any help with understanding this would be much appreciated.

Case 1: ZS is present, suggesting that there are >1 alignments, but NH=1.

HWI-ST1037:275:C496DACXX:7:1206:15243:63664 163 1   4802273 255 7S10M43550N82M  =   4845943 43777   GTTTGGGTCCCCCCTCCCCTGTCTCGGAAACAAACAAACAAACAAACCGAAACACAGACATACAGTATTTCCAACCTAGGTAATATGAAAAGAAATCAA BBBFFFBFFBBFFFFIIIFFFBBFBFBFFFFFFI7BFF<BBF<BFFIF7B77<<<BBBBB<07B<0<<BB<B<B777'<<0<B<<<<BBFFFFFF<<BF AS:i:-9 ZS:i:-16    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:92 YS:i:0  YT:Z:CP XS:A:+  NH:i:1

Case 2: NH=2, but ZS is not present.

HWI-D00505:41:C437JACXX:5:1315:15701:47013  163 1   4807888 1   95M472N5M   =   4808454 666 CCGACGCACTGTCCGCCAGCCGGTGGATGTGCGGCAACAACATGTCCGCTCCGATGCCCGCCGTTGTGCCGGCCGCCCGGAAGGCCACCGCCGCGGTTAT    BBBFFFFFFFFFFIIIIIIIIIIFFIFFIFFIIIIIFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFBFFF<BFF    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:100    YS:i:0  YT:Z:CP XS:A:+  NH:i:2

Best, Reiner

ghost commented 7 years ago

Reiner, was this ever addressed?

rschulzUK commented 7 years ago

Hi Rick,

I do not know. I have not recently used hisat to see if Daehwan addressed this.

Best, Reiner

ghost commented 7 years ago

Reiner,

I did a bit more digging. They appear to have addressed some of these issue with later releases. They did acknowledge that NH gave incorrect values. Now I just have to find out how they calculate their alignment scores :)

Dr Rick Tearle Senior Bioinformatician Davies Research Centre University of Adelaide Roseworthy Campus rick.tearle@adelaide.edu.au +61 432 07 58 07

From: rschulzUK notifications@github.com<mailto:notifications@github.com> Reply-To: infphilo/hisat2 reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, 20 October 2017 at 18:14 To: infphilo/hisat2 hisat2@noreply.github.com<mailto:hisat2@noreply.github.com> Cc: Rick Tearle rick.tearle@adelaide.edu.au<mailto:rick.tearle@adelaide.edu.au>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [infphilo/hisat2] AS, ZS and NH tags: semantics? (#49)

Hi Rick,

I do not know. I have not recently used hisat to see if Daehwan addressed this.

Best, Reiner

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/infphilo/hisat2/issues/49#issuecomment-338131347, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AX-ozMdrrQ4BJB2qndEpYzr3xb94kl8Eks5suE9ogaJpZM4IpB8l.