Closed davetang closed 7 years ago
Hello,
I cannot give you a specific running time, as it depends on the specific computer/setup (e.g. speed and memory also matter...). But as reference, when we have run it on datasets of 1-3k cells it typically takes a few hours. However, the running time for bigger datasets (e.g. >5k cells) often increases to few days (2-5days, with 24 cores most of the analyses finished within a week).
In case it helps: On really big datasets, I normally split the gene list into several subsets (this helps estimating the time that it is going to take to finish, and also saves intermediate results in case something goes wrong and crashes...). Example:
# Run on subsets of genes
# (dividing the original gene list into 10 pieces)
library(GENIE3)
genesSplit <- split(sort(rownames(exprMatrix_filtered)), 1:10)
lenghts(genesSplit)
for(i in 1:length(genesSplit))
{
print(i)
set.seed(93827)
weightMatrix <- GENIE3(exprMatrix_filtered, regulators=inputTFs, nCores=24, targets=genesSplit[[i]])
save(weightMatrix, file=paste0("GENIE3_weightMatrix_",i,".RData"))
}
# Merge results:
library(GENIE3)
linkList_list <- list()
for(i in 1:10)
{
load(paste0("int/1.3_GENIE3_weightMatrix_",i,".RData"))
linkList_list[[i]] <- getLinkList(weightMatrix)
}
length(linkList_list)
sapply(linkList_list, nrow)
linkList <- do.call(rbind, linkList_list)
colnames(linkList) <- c("TF", "Target", "weight")
linkList <- linkList[order(linkList[,"weight"], decreasing=TRUE),]
linkList <- linkList[which(linkList[,"weight"]>0),]
nrow(linkList)
head(linkList)
save(linkList, file="GENIE3_linkList.RData")
Brilliant, thanks!
I'm currently running the R version of GENIE3 and was wondering how long it takes to complete for a dataset with around 8,000 single cells and 12,000 genes. It's been running for around 8 hours on 32 cores.