SachaEpskamp / bootnet

Bootstrap methods for various network estimation routines
30 stars 14 forks source link

Trouble with Computing and Plotting Centrality using bootnet in R #131

Closed RRRannwwwmdudsss closed 3 months ago

RRRannwwwmdudsss commented 4 months ago

Hi, Sacha,

I used the following code for estimating CLPN network using bootnet(215 samples, 9 nodes, 4 covariates):

## Set number of covariates
numCovar = 4
## Set number of nodes
k = 10

CLPN.net <- function(data) {
  adjMatCovCLPN <- matrix(0, k+numCovar, k+numCovar) 
  for (i in 1:k) {
    set.seed(1) # commented out so that it doesn't give the same answer every time when bootstrapping
    lassoreg <- cv.glmnet(x=as.matrix(data[,c(1:k,(k*2+1):(k*2+numCovar))]), 
                          y=data[,(k+i)], nfolds=10,
                          family="gaussian", alpha=1, standardize=TRUE, parallel=TRUE)
    lambda <- lassoreg$lambda.min
    adjMatCovCLPN[1:(k+numCovar),i] <- coef(lassoreg, s=lambda, exact=FALSE)[2:(k+numCovar+1)]
  }
  adjMatCLPN <- adjMatCovCLPN[1:k, 1:k]
  return(adjMatCLPN)
}

## Estimate networks in bootnet (for bootstrapping)
set.seed(1)
net.t1.t2 <- estimateNetwork(net.t1.t2.df.cov, fun=CLPN.net, labels=labels, directed=T)

# Case-drop Bootstrap for Correlation Stability Analysis
caseBoot.t1.t2 <-bootnet(net.t1.t2, 
                     nCores =2, 
                     nBoots = 1000, 
                     type = "case",
                     statistics = c("outStrength", "inStrength","edge"), 
                     caseMin = 0.05,
                     caseMax = 0.80,
                     caseN = 10,
                     directed = T)
print(corStability(swan.caseBoot.t1.t2))

but found biased case-dropping (only one sampling in drop% 71.6, 38.1, 30.2……):

image

another question is about plotting centrality stability showing no 95% CI:

plot(caseBoot.t1.t2, statistics = c("outStrength", "inStrength"))

image

Expect your reply. Thanks a lot!

SachaEpskamp commented 4 months ago

Hi! Can you try with nCores = 1? Note that you need to write the function as if it is a new R session when using multi-core, so include packages and definitions (like k and numCovar). Do not include set.seed.

Tongzhao9417 commented 4 months ago

Hi! Can you try with nCores = 1? Note that you need to write the function as if it is a new R session when using multi-core, so include packages and definitions (like k and numCovar). Do not include set.seed.

Hey, Sacha. I have a question that why it is necessary to exclude set.seed? I'm curious about this.

SachaEpskamp commented 4 months ago

set.seed could lead to unexpected results when bootstrapping. E.g., I think when nCores = 1 at least it would lead also to the bootstrap samples to be identical. Not sure about multicore.

Tongzhao9417 commented 4 months ago

set.seed could lead to unexpected results when bootstrapping. E.g., I think when nCores = 1 at least it would lead also to the bootstrap samples to be identical. Not sure about multicore.

Thanks for your reply. I just curious the mechanism of the conflict between set.seed and nCores >1.

RRRannwwwmdudsss commented 4 months ago

Thanks for your reply. I tried with nCores = 1 not include set.seed. The multi-core has been resolved. However, biased case-dropping and plotting issue is still the same as above.

Custom function for estimating network using bootnet

CLPN.net <- function(data) {

Load necessary packages

library(glmnet)

Set number of covariates

numCovar <- 4

Set number of nodes

k <- 9 adjMatCovCLPN <- matrix(0, k + numCovar, k + numCovar) for (i in 1:k) { set.seed(1) # commented out so that it doesn't give the same answer every time when bootstrapping lassoreg <- cv.glmnet(x = as.matrix(data[, c(1:k, (k 2 + 1):(k 2 + numCovar))]), y = data[, (k + i)], nfolds = 10, family = "gaussian", alpha = 1, standardize = TRUE, parallel = TRUE) lambda <- lassoreg$lambda.min adjMatCovCLPN[1:(k + numCovar), i] <- coef(lassoreg, s = lambda, exact = FALSE)[2:(k + numCovar + 1)] } adjMatCLPN <- adjMatCovCLPN[1:k, 1:k] return(adjMatCLPN) }

Estimate networks in bootnet (for bootstrapping)

net.t1.t2 <- estimateNetwork(net.t1.t2.df.cov.emerge, fun=CLPN.net, labels=labels, directed=T)

Case-drop Bootstrap

caseBoot.t1.t2 <- bootnet(net.t1.t2, nCores = 8, nBoots = 1000, type = "case", statistics = c("outStrength", "inStrength", "edge"), directed = TRUE)

print(corStability(caseBoot.t1.t2)) plot(swan.caseBoot.t1.t2, statistics = c("outStrength", "inStrength"))

image

And, plotting centrality stability showing no 95% CI:

image
RRRannwwwmdudsss commented 4 months ago

Correction to: plot(caseBoot.t1.t2, statistics = c("outStrength", "inStrength"))

SachaEpskamp commented 3 months ago

I'm afraid that I cannot reproduce the problem. Is it possible for you to make a reproducible example or to share the data perhaps?

Best, Sacha

RRRannwwwmdudsss commented 3 months ago

There's an example dataset that resulted in similar plots. Could you try to replicate my issue using that data?

Thanks a lot! Best, Ran

CLPN

Data1<- read_excel("example2_Data1_t1.xlsx")#time1 Data2<- read_excel("example2_Data2_t2.xlsx")#time2 df <- as.matrix(cbind(Data1,Data2)) nt1=paste0("T1",colnames(Data1)) nt2=paste0("T2",colnames(Data2)) colnames(df)=c(nt1,nt2) labels=paste0(1:25)

CLPN.net <- function(data) {

Load necessary packages

library(glmnet)

Set number of covariates

numCovar <- 3

Set number of nodes

k <- 25 adjMatCovCLPN <- matrix(0, k + numCovar, k + numCovar) for (i in 1:k) { set.seed(1) # commented out so that it doesn't give the same answer every time when bootstrapping lassoreg <- cv.glmnet(x = as.matrix(data[, c(1:k, (k 2 + 1):(k 2 + numCovar))]), y = data[, (k + i)], nfolds = 10, family = "gaussian", alpha = 1, standardize = TRUE, parallel = TRUE) lambda <- lassoreg$lambda.min adjMatCovCLPN[1:(k + numCovar), i] <- coef(lassoreg, s = lambda, exact = FALSE)[2:(k + numCovar + 1)] } adjMatCLPN <- adjMatCovCLPN[1:k, 1:k] diag(adjMatCLPN) <- 0 return(adjMatCLPN) }

Network_CL <- estimateNetwork(df, fun = CLPN.net, labels = labels, directed = T)

Boot_CL_1 <- bootnet(Network_CL, directed = T, nCores = 4, nBoots = 1000, type = "nonparametric", statistics = c("edge", "OutStrength", "InStrength")) Boot_CL_2 <- bootnet(Network_CL, nCores = 4, nBoots = 1000, type = "case", statistics = c("outStrength", "inStrength"), directed = T)

plotting

plot(Boot_CL_1, labels = FALSE, order = "sample") print(corStability(Boot_CL_2)) plot(Boot_CL_2, statistics = c("outStrength", "inStrength")) example2_Data1_t1.xlsx example2_Data2_t2.xlsx

RRRannwwwmdudsss commented 3 months ago

I think I've found a solution to the problem: avoid including set.seed in the function.

CLPN.net <- function(data) { numCovar <- 3 k <- 25 adjMatCovCLPN <- matrix(0, k + numCovar, k + numCovar) for (i in 1:k) { lassoreg <- cv.glmnet(x = as.matrix(data[, c(1:k, (k 2 + 1):(k 2 + numCovar))]), y = data[, (k + i)], nfolds = 10, family = "gaussian", alpha = 1, standardize = TRUE, parallel = TRUE) lambda <- lassoreg$lambda.min adjMatCovCLPN[1:(k + numCovar), i] <- coef(lassoreg, s = lambda, exact = FALSE)[2:(k + numCovar + 1)] } adjMatCLPN <- adjMatCovCLPN[1:k, 1:k] diag(adjMatCLPN) <- 0 return(adjMatCLPN) }

Thank you for your kind help, Dr. Epskamp. With best regards and sincere wishes.