Gene Expression Dataset Uploading

gouri1310 commented 3 years ago

Hey, In the case of microarray datasets, many probes map to the same gene. Should the gene expression data I upload, include the average of gene expression of repeated genes, or does mMCP-counter automatically calculate the average of gene expression values to obtain the final scores?

FPetitprez commented 3 years ago

Hi! In the current version, webMCP-counter only accounts for the first occurence of each gene. f they are duplicated, the subsequent occurences are ignored. I would suggest you first compute the mean expression of each gene before runiing webMCP-counter. Also, mMCP-counter works best on data that has already been normalized and that are in log scale (e.g. log2(1+normalized value). Hope this helps!

gouri1310 commented 3 years ago

Thanks that helped! From the results obtained after step 1, which gives the absolute figure scores of the different cell populations constituting a sample, is it possible to calculate the tumor purity scores (the percentage of tumor cells in a solid tumor sample)?

FPetitprez commented 3 years ago

Unfortunately no, mMCP-counter cannot evaluate tumor purity. This is because of the way the algorithm works: we analyze the expression of genes that are specific for some populations. To analyze tumor purity, we would need to find genes that are highly expressed in all tumors and that are specific to tumor cells. Identifying such genes is nearly impossible. MCP-counter only returns results for the set of populations given in step 1 because those are the only one we can evaluate with sufficient accuracy, the rest would be subject to much more caution.

gouri1310 commented 3 years ago

Hello, I have a single cell RNA Seq dataset, with around 35806 genes and 674 cell samples. I am not able to run the MCP Counter, as it says the input exceeds the limit. Could you please suggest some ways to resolve this? Thanks!

On Tue, Jun 22, 2021, 2:44 PM FPetitprez @.***> wrote:

Unfortunately no, mMCP-counter cannot evaluate tumor purity. This is because of the way the algorithm works: we analyze the expression of genes that are specific for some populations. To analyze tumor purity, we would need to find genes that are highly expressed in all tumors and that are specific to tumor cells. Identifying such genes is nearly impossible. MCP-counter only returns results for the set of populations given in step 1 because those are the only one we can evaluate with sufficient accuracy, the rest would be subject to much more caution.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/FPetitprez/webMCP-counter/issues/9#issuecomment-865792520, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP2KP77ALLMBDE3DIASVWI3TUBH7HANCNFSM47AHGWVA .

FPetitprez commented 3 years ago

Hi! If you want to run MCP-counter (or mMCP-counter if it is mouse data) on this large dataset, there are two options : 1) You can split it in several smaller datasets (each with all the genes but in the 100 cells), run webMCP-counter on each, download the scores table and regroup them. 2) use the original R packages. webMCP-counter is simply a user interface, but behind the scenes it is actually the R packages that are called. If you know how to use R, this is the best way to do it. For the human MCP-counter, the package is here: https://github.com/ebecht/MCPcounter and the mouse mMCP-counter is here: https://github.com/cit-bioinfo/mMCP-counter Using the R packages will allow you to run (m)MCP-counter on much larger datasets. The limitation is just on webMCP-counter not to overload the server.

FPetitprez commented 3 years ago

One important note though: MCP-counter is a deconvolution tool that was designed to quantify cell types in bulk transcriptomics data, so using it on single-cell RNA-seq data is not the original intended use. However, it can help you to annotate your cells, if you compare for instance the T cell scores between all the cells, you should see groups of cells with much higher scores than all other cells, and these are most likely the T cells. Same goes for all other cell populations.

FPetitprez / webMCP-counter

Gene Expression Dataset Uploading #9