JSB-UCLA / Clipper

A p-value-free method for controlling false discovery rates in high-throughput biological data with two conditions
38 stars 5 forks source link

Normalization and Log #2

Open uumami opened 2 years ago

uumami commented 2 years ago

Hello :) In general, is it a good practice to normalize (0,1) and/or log-transform all types of data? Or it totally depends on the type of data?

xcggates commented 2 years ago

Hello, Thanks for your question! For differential gene expression analysis, we suggest using log-transformation after TMM normalization. For peak calling and peptide identification, we do not do any normalization or data transformation. You can check our vignette for detailed implementation of Clipper in different analysis tasks: log-transformation

uumami commented 2 years ago

Sorry for the late reply! What if we don't know the type of data? Is there any heuristic for the transformation? As of now if the data is negative we just add a number so it becomes positive, but we were wondering if it is a good idea to always use the log-transform?

xcggates commented 2 years ago

If we do not know the type of data and the data could be negative, the safest solution is to run Clipper on the raw data without using any transformation. I suggest using the difference contrast score (in Clipper function, set "contrast.score = 'diff'"), which means that the interesting features are those which have differences in means between the two conditions. We can first run Clipper under this simple setting and look at the distribution of contrast scores ("re$contrast.score.value" if "re" is Clipper output) to see if the results look good. A good distribution of contrast score should be generally symmetric around zero with interesting features being the right tail.