Open cnluzon opened 3 years ago
When filtering by p-value or log fc, the values used currently for both selecting and plotting are the coverage values. I haven't implemented yet a version where you can do this including input bigWig files.
However if we want a scatter plot where bins are normalized to corresponding input (logfc or not), would it be correct to calculate the significance over those?
My feeling is that the best approach is not to do that, but instead:
1) Select significant bins on "raw" (just the scaled bigwig coverage value) bins. 2) Plot the log(bin / input) values. Since these are replicates, the plot will be: log(mean(bins) / mean(inputs)), which I think it's more robust than aggregating the individual log values. 3) Highlight bins that were significant in step (1).
I would need some feedback on this issue @simonelsasser
However if we want a scatter plot where bins are normalized to corresponding input (logfc or not), would it be correct to calculate the significance over those?
I think it depends on the normalisation if or not using external size factors, which represent the input sizes. If the aim is to remove the coverage variance on the input intervals, sample_bin / input_bin can "flatten the curve" on the igv profile plot, and the differential expression is likely to be the same.
log(mean(bins) / mean(inputs))
can make a nicer plot by removing extreme values from the GC repeats, if they're MINUTE-ChIPs.
After merging #80, a results table with adjusted p-values and log fold change across replicates can be generated.
This is a reminder that a plotting function for such bins using a pval or/and logfc threshold can be implemented (scatterplot with highlighted dots and mean values).