BioinformaticsFMRP / PanCanStem_Web

https://bioinformaticsfmrp.github.io/PanCanStem_Web/
GNU General Public License v3.0
30 stars 14 forks source link

how to calculate stemness on my data? #5

Closed a00101 closed 2 years ago

a00101 commented 3 years ago

I can't find the 'how to document' where can I find ?

Thanks.

tiagochst commented 2 years ago

Hi, there is a function in TCGAbiolinks called TCGAanalyze_Stemness https://rdrr.io/bioc/TCGAbiolinks/man/TCGAanalyze_Stemness.html

But here is the code for stemless score (you can adapt it) and data (I added it in https://github.com/BioinformaticsFMRP/PanCanStem_Web/tree/master/Stemsig):

signature <- readr::read_tsv(
  "https://raw.githubusercontent.com/BioinformaticsFMRP/PanCanStem_Web/master/Stemsig/SC-pcbc-stemsig.tsv",
  col_names = F
) 

signature.weight.vector <- signature$X2
names(signature.weight.vector) <- signature$X1

# Just an example with correlation 1 and -1
gene.expression.matrix <-  matrix(signature$X2)
rownames(gene.expression.matrix) <- signature$X1
gene.expression.matrix <- cbind(gene.expression.matrix,gene.expression.matrix * -1)

calculate_score <- function(signature.weight.vector, gene.expression.matrix){

  # Keep only common genes 
  common.genes <- intersect(names(signature.weight.vector), rownames(gene.expression.matrix))
  gene.expression.matrix <- gene.expression.matrix[common.genes, ,drop = FALSE]
  signature.weight.vector <- signature.weight.vector[common.genes]

  score <- apply(gene.expression.matrix, 2, function(sample) {
    cor(sample, signature.weight.vector, method = "sp", use = "complete.obs")
  })

  print(paste0("Min score: ",min(score)))
  print(paste0("Max score: ",max(score)))

  # Scale the scores to be between 0 and 1
  print(paste0("Normalized scores to be between 0 and 1"))
  score <- score - min(score)
  score.normalized <- score/max(score)
  print(paste0("Min normalized score: ",min(score.normalized)))
  print(paste0("Max normalized score: ",max(score.normalized)))

  return(score.normalized)
}

calculate_score(signature.weight.vector,gene.expression.matrix)
Nuvolar commented 2 years ago

which data expression type should be using when i calculate stemness by TCGAanalyze_Stemness, count ,tpm or fpkm. the pipline in synapse is using rpkm, if I understand you correctly.

tiagochst commented 2 years ago

We added it in TCGAbiolinks: https://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/stemness_score.html Yes, they used RPKM aligned to hg19, but I would not expect an impact in the results using TPM or FPKM aligned to hg38. But they would need to check it.

On Fri, Jul 22, 2022 at 7:45 AM Nuvolar @.***> wrote:

which data expression type should be using when i calculate stemness by TCGAanalyze_Stemness, count ,tpm or fpkm. the pipline in synapse is using rpkm, if I understand you correctly.

— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/PanCanStem_Web/issues/5#issuecomment-1192440764, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6KU3J4TRJ2VFN5PIWLVVJ3TFANCNFSM45IVHAOQ . You are receiving this because you modified the open/close state.Message ID: @.***>