Open sam-israel opened 3 years ago
In the majority of the cases for this file however, the error is not the sam file flag, but the YT:Z:UU flag.
In this run, tophat has received both PE and SR reads. About 98% were PE. Despite that (subsampling the file), about 98% are mapped with the YT:Z:UU flag.
In the resulting BAM file of a TopHat (v2.1.1, bowtie2 version 2.3.4.3) run, there are reads that do map as "read mapped in proper pair" (their flags "include" the flag 0x2); however their YT flag has YT:Z:UU value, which indicates that they were not part of a pair.
These is an example of reads out of the mapped file :
A01056:33:HF3NFDSXY:1:2516:13657:30718 435 1 91387362 0 117M 21 8218147 0 CCTGTGGTAACTTTTCTGACACCTCCTGCTTAAAACCCAAAAGGTCAGAAGGATCGTGAGGCCCCGCTTTCACGGTCTGTATTCGTACTGAAAATCAAGATCAAGCGAGCTTTTGCC :FF:F:FFFF:FFFFFFFFFFFFFF:FF,FFF,FFFFFF:FFF:FFFFF:FF:FF:FFFFFFF:FFFFFFFFF:FFFFFFFFF,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:117 YT:Z:UU NH:i:20 CC:Z:= CP:i:91387362 XS:A:- HI:i:2
A01056:33:HF3NFDSXY:1:2516:13657:30718 371 21 8218147 0 112M 1 91387362 0 GGGCAAAAGCTCGCTTGATCTTGATTTTCAGTACGAATACAGACCGTGAAAGCGGGGCCTCACGATCCTTCTGACCTTTTGGGTTTTAAGCAGGAGGTGTCAGAAAAGTTAC :F:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFF:FFFF::FFFFFFFFFF:FFFFFFFFFFFFFFFFFFF AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:112 YT:Z:UU NH:i:20 CC:Z:GL000220.1 CP:i:161594 XS:A:+ HI:i:2
It can be seen that the mate alignment on chromosome 21, while a read is aligned on chromosome 1 - so the setting of unpaired is the correct one.
The command was
tophat --mate-inner-dist -139 --mate-std-dev 50 -o align/Sample10 -G /.../Homo_sapiens/Ensembl/GRCh38/Annotation/Genes/genes.gtf -N 10 --read-gap-length 5 --read-edit-dist 15 --segment-length 20 --read-realign-edit-dist 3 --no-coverage-search --library-type fr-firststrand -p 32 /.../Homo_sapiens/Ensembl/GRCh38/Sequence/Bowtie2Index/genome Sample10_R1_clean_pe.fastq.gz Sample10_R2_clean_pe.fastq.gz,processed/Sample10_R1_clean_se.fastq.gz,processed/Sample10_R2_clean_se.fastq.gz
Is there any information about this bug?
Does this seem to be a bowtie/tophat error?