ctlab / GAMclust

GAM-clustering: Metabolic Modules Derivig From RNAseq Data
Other
2 stars 0 forks source link

gamClustering : Error in FUN(X[[i]], ...): Not all edge weights are finite numbers #1

Closed TEAM-4-CEPR closed 3 weeks ago

TEAM-4-CEPR commented 8 months ago

Hi, Thank you for providing your tools. But i'm experiencing some issues.

I'm using it, on published data : GSE84901 I followed the tutorial on your git : https://rpubs.com/anastasiiaNG/GAMclust_BULK , to run GAM_clust on bulk RNA data.

but when I run gamClustering function, it threw this error :

[*] Iteration 1

Error in FUN(X[[i]], ...): Not all edge weights are finite numbers
Traceback:

1. gamClustering(E.prep = E.prep, network.prep = network.prep, cur.centers = cur.centers, 
 .     start.base = 0.5, base.dec = 0.05, max.module.size = 30, 
 .     cor.threshold = 0.9, p.adj.val.threshold = 0.001, batch.solver = seq_batch_solver(solver), 
 .     work.dir = work.dir, show.intermediate.clustering = T, verbose = T, 
 .     collect.stats = TRUE)
2. lapply(nets, mwcsr::normalize_sgmwcs_instance, edges.weight.column = "score", 
 .     nodes.weight.column = "score", edges.group.by = "gene", nodes.group.by = NULL, 
 .     group.only.positive = TRUE)   # at line 89-94 of file <text>
3. lapply(nets, mwcsr::normalize_sgmwcs_instance, edges.weight.column = "score", 
 .     nodes.weight.column = "score", edges.group.by = "gene", nodes.group.by = NULL, 
 .     group.only.positive = TRUE)
4. FUN(X[[i]], ...)
5. stop(sprintf("Not all edge weights are finite numbers"))

I tried to modify the function gamClustering like this to prevent this error :

gamClustering <- function(E.prep,
                          network.prep,
                          cur.centers,

                          start.base = 0.5,
                          base.dec = 0.05,
                          max.module.size = 50,

                          cor.threshold = 0.8,
                          p.adj.val.threshold = 0.001,

                          batch.solver = seq_batch_solver(solver),
                          work.dir,

                          show.intermediate.clustering = TRUE,
                          verbose = TRUE,
                          collect.stats = TRUE
                          ){

  iteration <- 1

 [...]
      nets <- lapply(idxs, function(i) {

        minOther <- pmin(apply(dist.to.centers[-i, , drop=F], 2, min), base)
        score <- log2(minOther) - log2(dist.to.centers[i, ])
        score[score == Inf] <- 0
        score <- pmax(score, -1000)
        posScores_keeping_var <<- c(posScores_keeping_var, length(which(score>0)))

        EdgeTable <- data.table::as.data.table(data.table::copy(network.prep))
        EdgeTable[, score := score[gene]]
        EdgeTable[from > to, c("from", "to") := list(to, from)]
        EdgeTable <- EdgeTable[order(score, decreasing = T)]
        EdgeTable <- unique(EdgeTable, by=c("from", "to"))
        # we still keep loops here

        ##Two line , I added

        Iscore <- EdgeTable$score[is.finite(EdgeTable$score)]
        EdgeTable <- EdgeTable %>% dplyr::filter(score %in% Iscore)

        scored_graph <- igraph::graph_from_data_frame(EdgeTable, directed = F)
        igraph::V(scored_graph)$score <- 0
        scored_graph
      })

      nets_attr <- lapply(nets, mwcsr::normalize_sgmwcs_instance,
                          edges.weight.column = "score",
                          nodes.weight.column = "score",
                          edges.group.by = "gene",
                          nodes.group.by = NULL,
                          group.only.positive = TRUE)

     [..]

With this I can run the function until second iteration but it threw this error :

[*] Iteration 1

>> base was equal to: 0.5;

>> number of modules was equal to: 32;

>> sizes of modules (unique genes) were in range: 0-4

Max diff: 0.42

[*] Iteration 2

Error in eval(expr, envir, enclos): Trying to fix variable with value that is out of possible range.
Traceback:

1. gamClustering(E.prep = E.prep, network.prep = network.prep, cur.centers = cur.centers, 
 .     start.base = 0.5, base.dec = 0.05, max.module.size = 30, 
 .     cor.threshold = 0.9, p.adj.val.threshold = 0.001, batch.solver = seq_batch_solver(solver), 
 .     work.dir = work.dir, show.intermediate.clustering = T, verbose = T, 
 .     collect.stats = TRUE)
2. batch.solver(nets_attr)   # at line 99 of file <text>
3. lapply(instances, mwcsr::solve_mwcsp, solver = mwcs_solver)
4. FUN(X[[i]], ...)
5. solve_mwcsp.rnc_solver(X[[i]], ...)
6. sgmwcs_solve(inst_rep, solver)

Is there a way to prepare my count matrix or to fix the error in order to use GAMclust ?

Thank you for your time.

IGuy

anastasiiaNG commented 8 months ago

@TEAM-4-CEPR, hi.

Usage of the mentioned tutorial implies that you have already preprocessed normalized data. Did you apply any of the data preprocessing pipelines to the microarray values you've loaded from the GEO? If yes, could you please provide its code here so that I could accurately restore your input data object E.prep?

Thank you, Ana

assaron commented 7 months ago

@TEAM-4-CEPR regarding the second error "Trying to fix variable with value that is out of possible range" this should be fixed in the latest version of mwcsr package. Please install it from GitHub with devtools::install_github("ctlab/mwcsr").

Still, it would be helpful if you can provide your input data, as the first error also looks like a bug.

TEAM-4-CEPR commented 7 months ago

Hello, Sorry for the delay of the answer. Indeed, after reinstalling the tools. The function works !! Thank you ! But I now got an error at the end of Gamclust function :

ATTENTION: The reliability of the outputs falls short of our expectations. Need to tune the method's parameters to enhance the overall quality of the results.
Erreur dans solve_mwcsp.rnc_solver(X[[i]], ...) : 
  abs(weight - res$lb) < EPS n'est pas TRUE 

I tried to modify max.module.size and cor.threshold but the error persist.

Do you have any idea ?

Here is the table I used : https://filesender.renater.fr/?s=download&token=3d9e290b-f136-440a-8a00-9227aca2e854

Thank a lots !

IGuy

anastasiiaNG commented 7 months ago

@TEAM-4-CEPR,

please update the GAMclust package up to the latest version (devtools::install_github("ctlab/GAMclust", force = T)) -- now your analysis should run without failure.

Btw, I would also recommend you to exclude from your data the GSM2253780 sample as it is an outlier -- you can see that if you visualize your dataset (E.prep) by the PCA. Removing outliers enhance the overall quality of the analysis.

Best, Ana

I-Guy commented 6 months ago

@anastasiiaNG Thank you so much !

I will try the latest version.

Best,

Guy