dannlbol / mcibersort_scripts

scripts used for Grabovska et al methylCIBERSORT analysis
GNU General Public License v3.0
4 stars 3 forks source link

feature. select. new” function #2

Open cui-shuang opened 1 year ago

cui-shuang commented 1 year ago

Hello, in the “feature. select. new” function, “CellLines” When is matrix set to NULL? If I want to deconvolute the methylation cell type of esophageal cancer, do I need to input the methylation matrix of esophageal cancer in “CellLines.matrix”?

wudustan commented 1 year ago

Hi,

feature.select.new() is used to train a new signature model.

Stroma.matrix should be the matrix of pure non-cancer reference populations such as immune cells. Phenotype.stroma should be a vector of labels for the populations of the Stroma.matrix.

CellLines.matrix should be a matrix of reference cell data you want CIBERSORT to treat as a 'pure' Cancer population. We previously used cell lines of our cancer of interest for this purpose. If you want to have one of the components of the deconvolution be a prediction of the proportion of tumour cells in your samples, then it is useful to have some kind of 'Cancer' reference. Otherwise, if you don't wish to include this, you can leave this argument as NULL.

For example, here I use the FlowSorted.Blood.450k to train a basic signature:

require(FlowSorted.Blood.450k)
require(IlluminaHumanMethylation450kmanifest)
idx <- which(FlowSorted.Blood.450k$CellType %in% c("Bcell","CD4T","CD8T","NK"))
bvals <- getBeta(FlowSorted.Blood.450k[, idx])

feature.select.new(Stroma.matrix = bvals,
                   Phenotype.stroma = as.factor(FlowSorted.Blood.450k$CellType[idx]),
                   sigName = "test")

## the result will be stored in getwd()

If you wish to use an already created signature matrix, such as the one we provide with this data:

## assuming 'esophageal_mix.txt' exists in your working directory, and is a matrix of betavalues of the format CpG x samples

source("./CIBERSORT.R") ## code available under license upon request from https://cibersort.stanford.edu/
esophageal.results <- CIBERSORT(sig_matrix = "./test_0.2_100_Signature.txt",
                     mixture_file = "./esophageal_mix.txt",
                     perm = 1000,
                     QN = FALSE,
                     absolute = FALSE,
                     abs_method = 'sig.score')
wudustan commented 1 year ago

Sorry closed issue by accident.

cui-shuang commented 1 year ago

OK, very nice! Thanks! I got it.

cui-shuang commented 1 year ago

Hi, I would like to ask another question.

When CellLines.matrix = NULL, the result obtained after running CIBERSORT is shown in Figure 1; When CellLines.matrix = methylation data for the LUAD cell line, the results obtained after running CIBERSORT are shown in Figure 2.

I want to know if the cancer column represents the proportion of tumor cells in each sample, I want to confirm it again.

Thanks!

Figure 1: image

Figure 2: image

wudustan commented 1 year ago

Hi @cui-shuang Yes in Figure 2, Cancer should be the proportion of tumour in your sample of interest. I can see why you'd want to confirm as it looks like the estimated fraction is very low.

Can I just check, are you running CIBERSORT in 'relative' or 'absolute' mode?

There are a couple of things I'd recommend to try and diagnose issues with the signature matrix:

  1. Subset the data you input in Stroma.matrix to the CpGs that are in your new signature matrix, and then plot a PCA/TSNE and a heatmap and see how well the samples resolve and look at the heatmap to see how consistent the probe signals are in each population. You can manually cbind() the data for the CellLines.matrix too.
  2. Create a correlation matrix of the data in 1 to see if any populations correlate with each other if using only the CpGs from the signature.
  3. You can make synthetic 'ground truth' data using the make.synth.mix() function provided in this repository. See documentation for more info. Essentially, you can use your stroma & cancer 'pure' data to make synthetic mixtures of input 'samples' to test vs the known proportions you're inputting and see how well they line up. Typically you should see a near 1:1 linear relationship.

I'd be happy to look at any interrim plots you want a hand with

cui-shuang commented 1 year ago

Hi, thank you very much for your suggestion!

I saw a Stroma.matrix made in an article and I used the same data. The input to CellLines.matrix is ​​processed by myself by downloading the cell line data. After downloading, use the minfi package to read in, then perform single-sample Noob normalization, and finally read the Beta value. The last two results are what I mentioned in my previous question.

I would like to ask if I use the Stroma.matrix made in other articles, is it scientific and explanatory in this case? Or do I need to make a new Stroma.matrix on my own.

grateful!

wudustan commented 1 year ago

Hi, Do you know how the Stroma data was preprocessed? If there are different methods used to normalise the data you might find some discrepancy in how they perform but while I would probably expect some kind of batch effect, I would still expect a higher tumour estimate than what you're getting. If you are able to do some QC mentioned above on your signature matrix and see how well it resolves the original data, you will get a better idea of where it's going wrong.