broadinstitute / PANOPLY

Repository for the Broad Institute Proteogenomic Data Analysis Center (PGDAC) established by the NIH Clinical Proteomics Tumor Analysis Consortium (CPTAC)
Other
33 stars 15 forks source link

How to normalize CNA and RNA data before NMF? What is the normalization method in the example data #29

Closed SepOrion closed 2 years ago

SepOrion commented 3 years ago

"Both CNA (normalized log-ratio, derived from WXS, WGS or combination) and RNA expression (log-transformed and normalized, derived from RNAseq) data are required. These data must be normalized prior to input in PANOPLY.“

What is the data type before normalization of RNA and CNA, and what is the normalizaition performed before NMF?What is the normalization method in the example data?

For example, the RNA data before normalization is FPKM or TPM? Then perform log2 normalization on RNA data?

drmani commented 3 years ago

The input data in the tutorial are as follows:

The NMF clustering module takes data on a similar scale as proteomics data (ie log ratios to a feature-relative reference), and z-scores are calculated across samples (columns) before performing NMF clustering. See https://github.com/broadinstitute/PANOPLY/wiki/Data-Analysis-Modules%3A-panoply_mo_nmf.