Open JohnHadish opened 5 months ago
Test to see impact on DESeq2 downstream:
Incorrect = -l arguement (75) -- 4295 DEG genes Correct = -l arguement (150) -- 3944 DEG genes
Overlap between the two groups - 3907 DEG genes
Figure 1: DEG which are extra in the Incorrect group (75 group) and DEG which are not included in the Incorrect group which are in the correct group
Figure 2: DEG Same, except this time plot base mean instead of padj
The Cutoff used was log2 threshold of at least 5 and padj less than 0.05
Description of the bug
The prepDE.py script from Stringtie is used by GEMmaker to estimate raw reads for downstream differential expression. It is used in both the Hisat2 and STAR pipelinies. Currently, GEMmaker does not set an
-l
argument for this script, which is the average length of input reads. The default-l
for theprepDE.py
script is 75. In the example below, I changed the-l
argument to 150 (the proper number for this dataset) and the output is dramatically different (multiplied by 2).This will have a impact on downstream DE, as this can improperally push reads with low counts above the threshold of detection, and will cause differences to be exagerated. This is a critical Bug.
If I run with default (no
-l
argument) (default is 75) (this is how GEMmaker currently runs Hisat2 and Star)TraesCS1A03G0000200,201 TraesCS1A03G0000400,260 TraesCS1A03G0000600,249 TraesCS1A03G0000800,12874 TraesCS1A03G0001000,21 TraesCS1A03G0001200,31 TraesCS1A03G0001300,341 TraesCS1A03G0001400,25 TraesCS1A03G0001500,3093
If I run with 150 as the
-l
argument (notice that everything is doubled)TraesCS1A03G0000200,101 TraesCS1A03G0000400,130 TraesCS1A03G0000600,125 TraesCS1A03G0000800,6437 TraesCS1A03G0001000,11 TraesCS1A03G0001200,16 TraesCS1A03G0001300,171 TraesCS1A03G0001400,13 TraesCS1A03G0001500,1547
Command used and terminal output
No response
Relevant files
No response
System information
No response