DaehwanKimLab / tophat

Spliced read mapper for RNA-Seq
http://ccb.jhu.edu/software/tophat
Boost Software License 1.0
90 stars 46 forks source link

Incorrect assignment of flag 0x2 (read mapped in proper pair) #61

Open sam-israel opened 3 years ago

sam-israel commented 3 years ago

In the resulting BAM file of a TopHat (v2.1.1, bowtie2 version 2.3.4.3) run, there are reads that do map as "read mapped in proper pair" (their flags "include" the flag 0x2); however their YT flag has YT:Z:UU value, which indicates that they were not part of a pair.

These is an example of reads out of the mapped file :

A01056:33:HF3NFDSXY:1:2516:13657:30718 435 1 91387362 0 117M 21 8218147 0 CCTGTGGTAACTTTTCTGACACCTCCTGCTTAAAACCCAAAAGGTCAGAAGGATCGTGAGGCCCCGCTTTCACGGTCTGTATTCGTACTGAAAATCAAGATCAAGCGAGCTTTTGCC :FF:F:FFFF:FFFFFFFFFFFFFF:FF,FFF,FFFFFF:FFF:FFFFF:FF:FF:FFFFFFF:FFFFFFFFF:FFFFFFFFF,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:117 YT:Z:UU NH:i:20 CC:Z:= CP:i:91387362 XS:A:- HI:i:2

A01056:33:HF3NFDSXY:1:2516:13657:30718 371 21 8218147 0 112M 1 91387362 0 GGGCAAAAGCTCGCTTGATCTTGATTTTCAGTACGAATACAGACCGTGAAAGCGGGGCCTCACGATCCTTCTGACCTTTTGGGTTTTAAGCAGGAGGTGTCAGAAAAGTTAC :F:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFF:FFFF::FFFFFFFFFF:FFFFFFFFFFFFFFFFFFF AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:112 YT:Z:UU NH:i:20 CC:Z:GL000220.1 CP:i:161594 XS:A:+ HI:i:2

It can be seen that the mate alignment on chromosome 21, while a read is aligned on chromosome 1 - so the setting of unpaired is the correct one.

The command was

tophat --mate-inner-dist -139 --mate-std-dev 50 -o align/Sample10 -G /.../Homo_sapiens/Ensembl/GRCh38/Annotation/Genes/genes.gtf -N 10 --read-gap-length 5 --read-edit-dist 15 --segment-length 20 --read-realign-edit-dist 3 --no-coverage-search --library-type fr-firststrand -p 32 /.../Homo_sapiens/Ensembl/GRCh38/Sequence/Bowtie2Index/genome Sample10_R1_clean_pe.fastq.gz Sample10_R2_clean_pe.fastq.gz,processed/Sample10_R1_clean_se.fastq.gz,processed/Sample10_R2_clean_se.fastq.gz

Is there any information about this bug?

Does this seem to be a bowtie/tophat error?

sam-israel commented 3 years ago

In the majority of the cases for this file however, the error is not the sam file flag, but the YT:Z:UU flag.

In this run, tophat has received both PE and SR reads. About 98% were PE. Despite that (subsampling the file), about 98% are mapped with the YT:Z:UU flag.