Closed cnluzon closed 3 years ago
One detail about DESeq's testing: the input counts need to be an integer larger than 1, better to be above 100.
Since it transforms read counts (or bin counts) by a hyperbolic function, smaller numbers (1, 10) are more sensitive for DESeq than larger ones, like 1000.
Because rsamtools::summary
gives read density in the bins, the counts are normally below 1. So either multiplying the bin width or any large constant number will be a good idea to avoid artefacts.
My idea was to simulate the "real" number of reads using estimated fragment length and bin size but then I thought a constant value independent of length may be better if a BED file is used that is not bins but something else where length may vary. Of course one could argue that in any case if there are large differences in length it would affect results in both ways.
I will use a larger constant for the count matrix to ensure results are more robust. Thanks!
Included a
bwstats.R
with functionality to compute statistics based onDESeq2
package for both bins and BED files.Added functions:
bw_bed_diff_analysis
,bw_bins_diff_analysis
. These operate on two lists of bigWig files, plus labels (labels are mandatory at this point because I believe this makes analysis more robust to mistakes like forgetting what was compared against what). So at this point you need to provide at least"treated"
"untreated"
or some labels to identify the groups of bigWig files.This is a work in progress. I still need to add some automated testing. Even though at this point I am not including actual values testing (relying on
DESeq2
) I am going to include at least tests to check that it runs and that the parameters passed go where they are supposed to go.Since we mostly operate on scaled bigWig files, the
estimateSizeFactors
step that usually is performed withDESeq2
is overridden. But a parameterestimate_size_factors
is provided:will run a normal
DESeq2
function. I provide this for the cases where we are not looking at scaled bigWig files. But the default value of this parameter is set toFALSE
.Returning value of these functions is a results table as the one obtained when you call
DESeq2::results
, which means that for any analysis you still need to set some cutoff thresholds for pvalue and / or fold change.