elsasserlab / labcode

Utils to perform frequent data analyses in the lab.
GNU General Public License v3.0
0 stars 1 forks source link

Plot differential analysis results #82

Open cnluzon opened 3 years ago

cnluzon commented 3 years ago

After merging #80, a results table with adjusted p-values and log fold change across replicates can be generated.

This is a reminder that a plotting function for such bins using a pval or/and logfc threshold can be implemented (scatterplot with highlighted dots and mean values).

cnluzon commented 3 years ago

When filtering by p-value or log fc, the values used currently for both selecting and plotting are the coverage values. I haven't implemented yet a version where you can do this including input bigWig files.

However if we want a scatter plot where bins are normalized to corresponding input (logfc or not), would it be correct to calculate the significance over those?

My feeling is that the best approach is not to do that, but instead:

1) Select significant bins on "raw" (just the scaled bigwig coverage value) bins. 2) Plot the log(bin / input) values. Since these are replicates, the plot will be: log(mean(bins) / mean(inputs)), which I think it's more robust than aggregating the individual log values. 3) Highlight bins that were significant in step (1).

I would need some feedback on this issue @simonelsasser

shaorray commented 3 years ago

However if we want a scatter plot where bins are normalized to corresponding input (logfc or not), would it be correct to calculate the significance over those?

I think it depends on the normalisation if or not using external size factors, which represent the input sizes. If the aim is to remove the coverage variance on the input intervals, sample_bin / input_bin can "flatten the curve" on the igv profile plot, and the differential expression is likely to be the same.

log(mean(bins) / mean(inputs)) can make a nicer plot by removing extreme values from the GC repeats, if they're MINUTE-ChIPs.