DEXseq dexseq_count.py paired ends low counts

lindsay-liang commented 4 years ago

Hi,

I'm running the dexseq_count.py script on paired bulk RNA-seq data, trying to get read counts on an exon level. The first time I ran the script, I forgot to include the -p yes option for paired end sequencing. However when I ran it a second time, including the argument I got extremely low read counts, compared to the first one.

Here are the results from DEXseq count for some of the exons without -p yes

ENSG00000136531.16+ENSG00000236283.5:032                303
ENSG00000136531.16+ENSG00000236283.5:033                158
ENSG00000136531.16+ENSG00000236283.5:034                490
ENSG00000136531.16+ENSG00000236283.5:035                40
ENSG00000136531.16+ENSG00000236283.5:036                262
ENSG00000136531.16+ENSG00000236283.5:037                188
ENSG00000136531.16+ENSG00000236283.5:038                114

Here are the results from DEXseq count with -p yes

ENSG00000136531.16+ENSG00000236283.5:032                2
ENSG00000136531.16+ENSG00000236283.5:033                2
ENSG00000136531.16+ENSG00000236283.5:034                7
ENSG00000136531.16+ENSG00000236283.5:035                3
ENSG00000136531.16+ENSG00000236283.5:036                2
ENSG00000136531.16+ENSG00000236283.5:037                2
ENSG00000136531.16+ENSG00000236283.5:038                0

Here are the read counts from samtools flagstat for each of the exons above:

samtools flagstat ENSG00000236283.5:032.bam

605 + 0 in total (QC-passed reads + QC-failed reads)
1 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
605 + 0 mapped (100.00% : N/A)
604 + 0 paired in sequencing
308 + 0 read1
296 + 0 read2
604 + 0 properly paired (100.00% : N/A)
604 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat ENSG00000236283.5:033.bam

274 + 0 in total (QC-passed reads + QC-failed reads)
1 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
274 + 0 mapped (100.00% : N/A)
273 + 0 paired in sequencing
158 + 0 read1
115 + 0 read2
273 + 0 properly paired (100.00% : N/A)
273 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat ENSG00000236283.5:034.bam

972 + 0 in total (QC-passed reads + QC-failed reads)
1 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
972 + 0 mapped (100.00% : N/A)
971 + 0 paired in sequencing
490 + 0 read1
481 + 0 read2
971 + 0 properly paired (100.00% : N/A)
971 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat ENSG00000236283.5:035.bam

291 + 0 in total (QC-passed reads + QC-failed reads)
1 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
291 + 0 mapped (100.00% : N/A)
290 + 0 paired in sequencing
156 + 0 read1
134 + 0 read2
290 + 0 properly paired (100.00% : N/A)
290 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat ENSG00000236283.5:036.bam

709 + 0 in total (QC-passed reads + QC-failed reads)
4 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
709 + 0 mapped (100.00% : N/A)
705 + 0 paired in sequencing
360 + 0 read1
345 + 0 read2
705 + 0 properly paired (100.00% : N/A)
705 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat ENSG00000236283.5:037.bam

420 + 0 in total (QC-passed reads + QC-failed reads)
2 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
420 + 0 mapped (100.00% : N/A)
418 + 0 paired in sequencing
188 + 0 read1
230 + 0 read2
418 + 0 properly paired (100.00% : N/A)
418 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat ENSG00000236283.5:038.bam

276 + 0 in total (QC-passed reads + QC-failed reads)
3 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
276 + 0 mapped (100.00% : N/A)
273 + 0 paired in sequencing
114 + 0 read1
159 + 0 read2
273 + 0 properly paired (100.00% : N/A)
273 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Here are the bedtools intersect results for the same exons:

ENSG00000236283.5:032               605
ENSG00000236283.5:033               274
ENSG00000236283.5:034               972
ENSG00000236283.5:035               291
ENSG00000236283.5:036               709
ENSG00000236283.5:037               420
ENSG00000236283.5:038               276

It looks like the count for aligned R1 reads from samtools are somewhat close to those reported by DEXseq count without ‘-p yes`, but not quite the same. My instinct is just that the reads not counted by dexseq didn't pass the quality threshold. The counts reported by bedtools are the same as the total number of reads aligned reported by samtools.

Any ideas as why we're seeing such low counts with -p yes?

areyesq89 commented 4 years ago

Hi @lindsay-liang. Thanks for your detailed report! It seems that for some reason the script is not reading detecting the read pairs as "properly paired". Would you mind sharing a few lines of your alignment files to explore why this might be the case? Also, what aligner did you use?

lindsay-liang commented 4 years ago

Hi @areyesq89, sorry for the very late follow up on this; it turns out that my data was reverse sequenced as well (which I didn't know). Thanks so much for your response!

areyesq89 / DEXSeq

DEXseq dexseq_count.py paired ends low counts #6