YangLi-Bio / stream2

STREAM: enhancer-driven gene regulatory networks inference from single-cell RNA-seq and ATAC-seq data
https://yangli-bio.github.io/stream2
Other
0 stars 0 forks source link

Slow graphs generation #1

Open ofircohn opened 1 month ago

ofircohn commented 1 month ago

Hey!

Thank you for this great package! I ran my multiomic data with the stream pipeline and found that it got stuck when generating igraphs.

Specifically in the part below. Any idea?. (I used >10 cores).

Make graphs

return( pbmcapply::pbmclapply(obj.list, function(x) { x.signac <- signac.links[signac.links$node1 %in% x$peaks,] if (nrow(x.signac) < 1) { return(NULL) } if (ifWeighted) { x.signac <- cbind(x.signac, weight = sapply(1:nrow(x.signac), function(i) { xx <- x.signac[i,] sum(atac.dis[xx$node1, x$cells] > 0 & rna.dis[xx$node2, x$cells] > 0) }) ) %>% dplyr::filter(weight > 0) } x.cicero <- coaccess.links[coaccess.links$node1 %in% x$peaks & coaccess.links$node2 %in% x$peaks,] if (ifWeighted) { x.cicero <- cbind(x.cicero, weight = sapply(1:nrow(x.cicero), function(i) { xx <- x.cicero[i,] sum(atac.dis[xx$node1, x$cells] > 0 & atac.dis[xx$node2, x$cells] > 0) }) ) %>% dplyr::filter(weight > 0) }

ig <- igraph::graph_from_data_frame(rbind(x.signac, x.cicero), directed = F)
if (ifWeighted) {
  igraph::E(ig)$weight <- length(x$cells) - c(x.signac$weight, x.cicero$weight)
}

Thank you!

YangLi-Bio commented 2 weeks ago

Hello!

Thank you for your interest in our STREAM program.

We understand that constructing a weighted heterogeneous graph based on large datasets can be computationally expensive. This is primarily because STREAM needs to process all enhancer-enhancer and enhancer-gene relationships, leading to a significant computational load.

To address this challenge, we employ two strategies. First, we filter highly variable genes using the parameter var.genes and top-ranked peaks with the parameter top.peaks. Second, we construct an unweighted heterogeneous graph. Our performance tests indicate that edge weights have minimal impact on accuracy, with the Steiner forest problem model predominantly determining the prediction of core enhancer-enhancer and enhancer-gene relationships.

To further enhance computational efficiency, we are rewriting the graph construction steps in C++, including integration with Signac and Cicero. We anticipate that this modification will substantially accelerate the process.

We welcome you to continue following and utilizing the STREAM program!

Best regards,