ChenWeiyan / LandSCENT

Landscape Single Cell Entropy
19 stars 6 forks source link

Error while plotting boxplot and calculating InferPotency #13

Closed SinhaI closed 3 years ago

SinhaI commented 4 years ago

Hi Weiyan,

Thank you for a great package. I'm trying to analyze about 8k cells. Basically, I'm using the Seurat object and converting to the SingleCellExperiment object. I am following the tutorial and able to calculate "SR.o" . When I am trying to plot the box plot then I am getting following error.

boxplot(SR.o$SR ~ my.data$type, main = "SR values against cell types", xlab = "Cell Types", ylab = "SR values") Error in stats::model.frame.default(formula = SR.o$SR ~ my.data$type) : invalid type (list) for variable 'SR.o$SR'

My SR.o$SR looks like

SR.o$SR :[[997]] [1] 0.8880284 [[998]] [1] 0.874321 [[999]] NULL [[1000]] NULL [ reached getOption("max.print") -- omitted 7607 entries ]

and my.data$type looks like:

my.data$type [993] "noninf" "noninf" "noninf" "noninf" "noninf" "noninf" "noninf" "noninf" [ reached getOption("max.print") -- omitted 7607 entries ]

I tried to check SR using following command

anyNA(SR.o$SR) [1] FALSE

I also tried to calculate InferPotency and getting following error:

InferPotency.o <- InferPotency(SR.o, pheno.v = my.data$type) [1] "Fit Gaussian Mixture Model to Signaling Entropies" Error in 1 - sr.v : non-numeric argument to binary operator

Any help would be highly appreciated. Thank you.

ChenWeiyan commented 4 years ago

Hi Sinhal,

Thanks for trying out my package!

It seems from your output object My SR.o$SR. The SR values are not all properly calculated, e.g the 999th/1000th are NULL.

Moreover, if the calculation of SR was correct, My SR.o$SR would be vector, rather than a list, which is now your case.

But I do not know the detail of your data. So one suggestion is, since 8k cells are not a big number, you can probably extract the expr matrix with command like assay(SingleCellExperiment object, "count"), and then give it to LandSCENT, which will make things simpler, and you may find out why the error happened.

Hope it be helpful!

Best, Weiyan

SinhaI commented 4 years ago

Dear Weiyan,

Thanks for your reply. As you suggested I have tried to extract the data as expr matrix. But still I am getting that error. I think I am doing something terrible wrong here. Below I am giving my code. It will be great if you can suggest something more. Let me know if you need some more information. Thanks for your help.

Best regards, Indranil.

library(LandSCENT) data(net13Jun12.m) data(phenoExample.v) library(Seurat) require(scater)

here I am getting the Seurat object

sub2.pbmc.integrated <- readRDS(file ="sub2_pbmc_SCT_14102020.rds") sub2.sce <- as.SingleCellExperiment(sub2.pbmc.integrated) example.m.sub2 <- assay(sub2.sce, "counts") example.sce <- SingleCellExperiment(assay = list(counts = example.m.sub2)) counts(example.sce) <- as(counts(example.sce), "dgCMatrix") sizeFactors(example.sce) <- librarySizeFactors(example.sce) example.sce <- logNormCounts(example.sce) example.m <- as.matrix(assay(example.sce, i = "logcounts")) min(example.m) example.m[example.m < 0.001] <- 0.1 require(AnnotationDbi) require(org.Hs.eg.db) anno.v <- mapIds(org.Hs.eg.db, keys = rownames(example.m), keytype = "SYMBOL", column = "ENTREZID", multiVals = "first") unique_anno.v <- unique(anno.v) example_New.m <- matrix(0, nrow = length(unique_anno.v), ncol = dim(example.m)[2]) for (i in seq_len(length(unique_anno.v))) { tmp <- example.m[which(anno.v == unique_anno.v[i]) ,] if (!is.null(dim(tmp))) { tmp <- colSums(tmp) / dim(tmp)[1] } example_New.m[i ,] <- example_New.m[i ,] + tmp } rownames(example_New.m) <- unique_anno.v colnames(example_New.m) <- colnames(example.m) example_New.m <- example_New.m[-which(rownames(example_New.m) %in% NA) ,] min(example_New.m) Integration.l <- DoIntegPPI(exp.m = example_New.m, ppiA.m = net13Jun12.m, log_trans = TRUE) str(Integration.l)

save.image(file = "LandSCENT_work_space_7.RData")

SR.o <- CompSRana(Integration.l, local = TRUE, mc.cores = 40)

my.data <- FetchData(sub2.pbmc.integrated,c("type", "patient", "sample", "seurat_cluster", "orig.ident")) boxplot(SR.o$SR ~ my.data$type, main = "SR values against cell types", xlab = "Cell Types", ylab = "SR values")

On Sat, Nov 14, 2020 at 10:15 AM Weiyan Chen notifications@github.com wrote:

Hi Sinhal,

Thanks for trying out my package!

It seems from your output object My SR.o$SR. The SR values are not all properly calculated, e.g the 999th/1000th are NULL.

Moreover, if the calculation of SR was correct, My SR.o$SR would be vector, rather than a list, which is now your case.

But I do not know the detail of your data. So one suggestion is, since 8k cells are not a big number, you can probably extract the expr matrix with command like assay(SingleCellExperiment object, "count"), and then give it to LandSCENT, which will make things simpler, and you may find out why the error happened.

Hope it be helpful!

Best, Weiyan

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ChenWeiyan/LandSCENT/issues/13#issuecomment-727172730, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWOZ47Z2FPAPMPKB4KHLE3SPZDD5ANCNFSM4TVHL6HQ .

ChenWeiyan commented 4 years ago

Hi Indranil,

Since you are setting log_trans = TRUE in DoIntegPPI, you can actually leave the normalization aside and let the function do it by itself.

And I can tell that you are trying to convert gene IDs. So I attached a script I wrote(and additional resource file) for gene ID converting. Maybe you can grab something useful.

So basically, you only need to extract count matrix from Seurat object ---> check if the row/col names are properly assigned ---> convert the gene IDs ---> then give it to LandSCENT ---> wait to see what will happen.

Best, Weiyan

GeneMap.zip

KatharinaKohler commented 4 years ago

Hi Weiyan,

Thank you for this tool! I am currently also trying to use the LandSCENT package for a Seurat object and it seems that I am running into the same problem as Indranil. I am trying to get corresponding gene IDs by using the code from the tutorial (step 3.3).

My code:

> #reading in count data from Seurat object
> expression_matrix_ <- as.matrix(GetAssayData(integrated_new, slot="counts", assay="RNA"))
> 
> #get gene annotation fitting to the PPI
> require(AnnotationDbi)
> require(org.Hs.eg.db)
> anno.v <- mapIds(org.Hs.eg.db, keys = rownames(expression_matrix_), keytype = "SYMBOL", 
>                  column = "ENTREZID", multiVals = "first")
> unique_anno.v <- unique(anno.v)
> example_New.m <- matrix(0, nrow = length(unique_anno.v), ncol = dim(expression_matrix_)[2])
> 
> 
> for (i in seq_len(length(unique_anno.v))) {
>   tmp <- expression_matrix_[which(anno.v == unique_anno.v[i]) ,]
>   if (!is.null(dim(tmp))) {
>     tmp <- colSums(tmp) / dim(tmp)[1]
>   }
>   example_New.m[i ,] <- example_New.m[i ,] + tmp
> }
> 
> rownames(example_New.m) <- unique_anno.v
> colnames(example_New.m) <- colnames(expression_matrix_)
> example_New.m <- example_New.m[-which(rownames(example_New.m) %in% NA) ,]
> matrix_sce <- example_New.m
> str(matrix_sce)

My expressionmatrix looks like this:

num [1:24113, 1:20286] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:24113] "RP11-34P13.7" "FO538757.2" "AP006222.2" "RP4-669L17.10" ...
  ..$ : chr [1:20286] "AAACCTGAGTCGTTTG_1" "AAACCTGAGTGGTCCC_1" "AAACCTGAGTGTACCT_1" "AAACCTGCACAACGTT_1" ...

And this is what I get out of str(matrix_sce):

num [1:17215, 1:20286] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:17215] "26155" "339451" "84069" "57801" ...
  ..$ : chr [1:20286] "AAACCTGAGTCGTTTG_1" "AAACCTGAGTGGTCCC_1" "AAACCTGAGTGTACCT_1" "AAACCTGCACAACGTT_1" ...

So I think that there is a problem with the step of gene ID conversion, but I can't find it.

I also tried your script for gene conversion Weiyan, but I seem to do something wrong, as if I do this I get the error "scRNA-seq data should have the same gene identifier with the network!" when I do the integration step.

I would appreciate any help!

Best, Katharina

ChenWeiyan commented 4 years ago

Hi Katharina,

Thanks for your feedback.

So for the gene ID conversion, the following is a quick explanation for my script:

  1. If you want to get homologous genes between human and mouse, you may load the HomoG first with HomoG <- read.delim("homologene.data", header = FALSE)
  2. Give the function your gene IDs with: (you can set HomoG = NULL if no need for homologous genes) ID_conversion <- GeneMap_opt(HomoG = HomoG, geneID, input_type = "SYMBOL", output_type = "ENTREZID", input_species = "HUM", output_species = "HUM")
  3. After it is done, ID_conversion$GeneMapG will give you the transferred IDs and you can use ID_conversion$Gene.idx to subset the original matrix, i.e New.data <- Ori.data[ID_conversion$Gene.idx ,]. And the row names of New.data should be assigned with ID_conversion$GeneMapG.

Hope this can help you.

Best, Weiyan

KatharinaKohler commented 3 years ago

Hi Weiyan,

Thank you for this explanation! I tried this but again it did not work. Here my code:

`expressionmatrix <- as.matrix(GetAssayData(integrated_new, slot="counts", assay="RNA"))

ID_conversion <- GeneMap_opt(HomoG = NULL, rownames(expressionmatrix), input_type = "SYMBOL", output_type = "ENTREZID", input_species = "HUM", output_species = "HUM")

New.data <- expressionmatrix[ID_conversion$Gene.idx ,] rownames(New.data) <- ID_conversion$GeneMapG

Integration.l <- DoIntegPPI(exp.m = New.data, ppiA.m = net17Jan16.m, log_trans = TRUE) `

The error I get here is

Fejl i DoIntegPPI(exp.m = New.data, ppiA.m = net17Jan16.m, log_trans = TRUE) : Non identical!!!

My New.data object looks like this:

num [1:17215, 1:20286] 0 0 0 0 0 0 0 0 0 0 ...

  • attr(, "dimnames")=List of 2 ..$ : Named chr [1:17215] "26155" "339451" "84069" "57801" ... .. ..- attr(, "names")= chr [1:17215] "NOC2L" "KLHL17" "PLEKHN1" "HES4" ... ..$ : chr [1:20286] "AAACCTGAGTCGTTTG_1" "AAACCTGAGTGGTCCC_1" "AAACCTGAGTGTACCT_1" "AAACCTGCACAACGTT_1" ...

Can you see what is going wrong here?

Best, Katharina

ChenWeiyan commented 3 years ago

Hi Katharina,

I forgot to mention that the output of GeneMap_opt under GeneMapG is a named vector.

You need to set

names(rownames(New.data)) <- NULL

right before

DoIntegPPI(exp.m = New.data, ppiA.m = net17Jan16.m, log_trans = TRUE)

Best, Weiyan

KatharinaKohler commented 3 years ago

Hi Weiyan,

Thank you so much for that! This part works now but I am still having the same problems with CompSRana as before. I am sorry for bothering again but I am very stuck with this. When I try to run CompSRana, I get the warning:

Warning message: In mclapply(idx.l, CompSRanaPRL, exp.m = Integration.l$expMC, adj.m = Integration.l$adjMC, : scheduled cores 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 did not deliver results, all values of the jobs will be affected

If I ignore this and continue I would get the same problem as Indranil. The SR values are NULL and SR.o$SR is a list instead of a vector.

My Integration.l object look like this (which in my eyes seems fine):

str(Integration.l) List of 3 $ expMC: num [1:9949, 1:20286] 0.138 0.138 2.712 0.138 0.138 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:9949] "29974" "339" "3837" "988" ... .. ..$ : chr [1:20286] "AAACCTGAGTCGTTTG_1" "AAACCTGAGTGGTCCC_1" "AAACCTGAGTGTACCT_1" "AAACCTGCACAACGTT_1" ... $ adjMC: num [1:9949, 1:9949] 0 1 1 1 1 1 1 1 1 1 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:9949] "29974" "339" "3837" "988" ... .. ..$ : chr [1:9949] "29974" "339" "3837" "988" ... $ data : num [1:17215, 1:20286] 0.138 0.138 0.138 0.138 0.138 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:17215] "26155" "339451" "84069" "57801" ... .. ..$ : chr [1:20286] "AAACCTGAGTCGTTTG_1" "AAACCTGAGTGGTCCC_1" "AAACCTGAGTGTACCT_1" "AAACCTGCACAACGTT_1" ...

In case you have a clue what I could be doing wrong I would be very happy about any tip!

Best, Katharina

ChenWeiyan commented 3 years ago

Hi Katharina,

The integration of data and PPI network looks fine now.

The warning itself indicates you will not obtain SR correctly for all cells. So I am afraid it cannot be ignored for good.

Warning message:
In mclapply(idx.l, CompSRanaPRL, exp.m = Integration.l$expMC, adj.m = Integration.l$adjMC, :
scheduled cores 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
37, 38, 39, 40 did not deliver results, all values of the jobs will be affected

In fact, I have encountered such situation before. In my case, I was using Linux Server with CentOS 7. And it threw such warning because of running out of memory since the parallel task consumed too much memory. It went away after I changed the mc.cores into a smaller number.

In your case, I am not sure if it exactly the same. But I suspect you are running on a server too, since you have at least 40 cores to run :) So lowering down the core number may help. And in your case, your cell number is reasonably large, i.e ~20k. So you probably need to check the memory use of your R process first, and then estimate a reasonable core number based on such memory, In case your overall memory would overflow.

Best, Weiyan

ChenWeiyan commented 3 years ago

I closed this issue for now since inactivation. Please reopen if needed.