ctlab / LinSeed

Linseed: LINear Subspace identification for gene Expresion Deconvolution
MIT License
28 stars 8 forks source link

Does the linseed perform rownormalization before calculating co-linearity in RNAseq data #10

Closed ArashDepp closed 4 years ago

ArashDepp commented 4 years ago

Hi, I have two questions:

  1. As emphasized in the paper, performing row normalization helps in the detection of tissue specific genes. So does the linseed perform that internally or do I have to provide the data which is row normalized?

  2. In the methods section of paper under TCGA data processing: "The dataset was then linear-transformed, and samples were normalized to have the same sum of expression levels." What linear transformation does it refer to? Should I perform that linear transformation myself or this is also taken care of by Linseed?

It would be great if you can help me to clear these doubts. Thanks.

konsolerr commented 4 years ago

Hi, @ArashDepp !

  1. This row normalization is performed internally when you initialize linseed object: you can access both normalized data (object$exp$full$norm) and non-normalized data (object$exp$full$raw).
  2. Linear transform means that data is in linear-scale (opposed to log-scale in which many other methods for gene expression analysis are using), this normalization you have to perform yourself before giving the data to the algorithm. For RNA-seq TPM would be the best choice (it is linear and TPM guarantees that all samples will have equal sum of one million).

To sum up: for rna-seq data you would have to normalize it so it is in linear scale and it somehow takes into account the library size. TPM is a natural choice for RNA-seq. Once you've done it, you can pass this data to LinSeed and row-normalization will happen internally.

Cheers, Kostya

ArashDepp commented 4 years ago

Okay...thanks a lot for clarification dear @konsolerr that means I am good to go with TPM data..