brentp / bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome
https://arxiv.org/abs/1401.1129
MIT License
139 stars 53 forks source link

YC and MD tag issue #93

Open Javkhaa opened 3 days ago

Javkhaa commented 3 days ago

We have this following alignment from bwa-meth

Mapped read R2:

A00949:383:HVVMTDSX7:4:1641:6849:18020  163     chr12   58133435        60      130M    =       58133470        164     
TTTTAAATGTGGATTTGATTATATTAATTTTATGGTAAAATATTGTTATGATTTTTTATTTTTTTTGTGATAAAGTGTTTAGTATGTGAATTTTAGTTTAAATTTTTAATTTTATTTTTTTTTTTTTTTT      
FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF::FFFFFFFFFFFFF:FFFF:FFFFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:F      MC:Z:79M1I50M   
YC:Z:GA 
MD:Z:8G1G0G4G16G0G9G4G16G1G5G1G4G3G1G7G34       
YD:Z:f  RG:Z:pool2_639_S16__val_        NM:i:16 AS:i:82 XS:i:35

Below is a stacked view of the reference and query sequence. Here we can see it is a C->T conversion mapping but according to YC and MD tag by bwa-meth it is G->A convertion map. Although YD tag which shows f(forward) reference genome meaning it should be C->T.

REF=TCTTAAACGTGGATCTGATCACATCAATCTCATGGTAAAATACTGCCATGACTTTTCATTCCCTTTGTGACAAAGTGTTTAGTATGTGAACTCCAGCTTAAATTTCCAACTTCATCTCTCTTCTTCCTTC
QRY=TTTTAAATGTGGATTTGATTATATTAATTTTATGGTAAAATATTGTTATGATTTTTTATTTTTTTTGTGATAAAGTGTTTAGTATGTGAATTTTAGTTTAAATTTTTAATTTTATTTTTTTTTTTTTTTT      
YC:Z:GA 
MD:Z:8G1G0G4G16G0G9G4G16G1G5G1G4G3G1G7G34       

As for R1 it has the correct YC and YD tag but with the wrong MD tag.

A00949:383:HVVMTDSX7:4:1641:6849:18020  83      chr12   58133470        60      79M1I50M        =       58133435        -164    
TAAAATATTGTTATGATTTTTTATTTTTTTTGTGATAAAGTGTTTAGTATGTGAATTTTAGTTTAAATTTTTAATTTTATTTTTTTTTTTTTTTTTTGTTTTATTTATTTTTAAAAATATAGTTTATGGG      
FFFFFF,FF:FF:FF:FFFFFFFFFFFFFFF:FF,FFF::F:FF::FF:FFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      MC:Z:130M       
YC:Z:CT 
MD:Z:9G4G16G1G5G1G4G3G1G7G35G23G5G0G0G0 
YD:Z:f  RG:Z:pool2_639_S16__val_        NM:i:16 AS:i:83 XS:i:41

and stacked view of reference and query

REF=TAAAATACTGCCATGACTTTTCATTCCCTTTGTGACAAAGTGTTTAGTATGTGAACTCCAGCTTAAATTTCCAACTTCA TCTCTCTTCTTCCTTCTGCCTTACCCATCCCCAAAAATACAGTTCATGGG
QRY=TAAAATATTGTTATGATTTTTTATTTTTTTTGTGATAAAGTGTTTAGTATGTGAATTTTAGTTTAAATTTTTAATTTTATTTTTTTTTTTTTTTTTTGTTTTATTTATTTTTAAAAATATAGTTTATGGG

Am i missing something here or is there some logic issue in creating YC/YD/MD tags

brentp commented 2 days ago

Hi, "MD" is added by bwa mem for the original alignment and not modified by bwa-meth so it won't be correct.

For your other questions, are you using a directional bs-seq protocol? If not, then you can't use bwa-meth.

YettaWang commented 2 days ago

I have the same question, so I am curious if the YC and YD tags have some other significance.

Javkhaa commented 2 days ago

Hi, "MD" is added by bwa mem for the original alignment and not modified by bwa-meth so it won't be correct.

For your other questions, are you using a directional bs-seq protocol? If not, then you can't use bwa-meth.

Oh got it thanks for the clarification. gotcha i will not use MD tag Yeah ours should be directional bs-seq protocol. I was using Nextflows MethylSeq pipeline.