cvborkulo / NetworkComparisonTest

Statistical comparison of two networks with respect to three invariance measures
21 stars 21 forks source link

Using the NCT for Cross-lagged panel networks #46

Open norvegicus7 opened 1 year ago

norvegicus7 commented 1 year ago

Hi @SachaEpskamp,

I am looking for a way to compare two cross-lagged network panel models (CLPN, Wysocki et al. approach) with the NCT. Is there any way to use the NCT with these CLPN models? I am aware that CLPN is a work in progress and the NCT is designed primarily for cross-sectional networks but barring other ways of examining the replicability/generalizability of CLPNs I think it would still be useful.

More detail: The CLPN currently relies upon a custom function that calls on glmnet. Here exemplified with a two-wave panel sample 2013-2015 and 9 variables for each wave.

df <- as.matrix(longi13_15)
k <- 9
adjMat <- matrix(0, k, k)
CLPN.fun <- function(df) {
  for (i in 1:k){
    lassoreg <- cv.glmnet(as.matrix(df[,1:k]), df[,(k+i)], 
                          family = "gaussian", alpha = 1, standardize=TRUE)
    lambda <- lassoreg$lambda.min 
    adjMat[1:k,i] <- coef(lassoreg, s = lambda, exact = FALSE)[2:(k+1)]
  }
  return(adjMat)
}

This function can then be used with bootnet's estimateNetwork, like this:

labels <- c("1", "2", "3", "4", "5", "6", "7", "8", "9")
pg13_15CL <- estimateNetwork(df, fun = CLPN.fun, labels = labels, directed = T)

Which also allows for it being used in bootstrapping procedures:

boot_CL <- bootnet(pg13_15CL, directed = T, nBoots = 1000, nCores = 1, type = "nonparametric", statistics = c("edge", "OutStrength", "InStrength"))

However, I am not able to feed this model (as estimated with estimateNetwork) into the NCT. Doing so results in an error message (pg15_15CLB is just another version of the same model for testing):

Error in NCT(pg13_15CL, pg13_15CLB, it = 10, estimator = CLPN.fun) :
Custom estimator function not supported for bootnet objects

Relatedly, the NCT works when I try the same approach with another function which calls on psychonetrics which might suggest that the issue relates to how it handles the CLPN function.

KarolineHuth commented 1 year ago

I agree that comparing the CLPN results across groups would be very useful. Unfortunately, this is not yet supported by NCT.

SachaEpskamp commented 12 months ago

Hi all,

The use of estimator and estimatorArgs in NCT was a first idea of implementing custom estimation functions which was later replaced by just relying on the bootnet functionality that is already available rather than implementing new default functions in NCT. So this method is now deprecated and perhaps the estimator and estimatorArgs arguments should be removed from NCT. The recommend method now is by using estimateNetwork first to estimate networks, then using those results directly in NCT. This way you make sure you use the same estimation method for performing the NCT test that you used to estimate the network structures. This works for the custom function as well, except that NCT expects the function to have an argument named verbose as well. Perhaps we can remove this requirement, but a minor adjustment to the custom function makes this operational! See below for a reproducible example. However, this likely does not give you proper results now, because NCT only works for undirected networks. I think it should be very easy to change that by replacing upper.tri(...) in several places in the code for NCT with a matrix that indexes a triangle if the network is undirected but the full matrix if a matrix is directed.

library("bootnet")

# CLPN function:
CLPN.fun <- function(df, verbose = FALSE) {
    library(glmnet)
    library(qgraph)
    library(bootnet)
    library(NetworkToolbox)

    k <- ncol(df) / 2
    adjMat <- matrix(0, k, k)
    for (i in 1:k){
        # set.seed(100) # <- this will lead to issues with bootstrapping
        lassoreg <- cv.glmnet(as.matrix(df[,1:k]), df[,(k+i)],
                              family = "gaussian", alpha = 1, standardize=TRUE,nfolds = 10)
        lambda <- lassoreg$lambda.min
        adjMat[1:k,i] <- coef(lassoreg, s = lambda, exact = FALSE)[2:(k+1)]
    }
    return(adjMat)
}

# Generate some data:
t1 <- matrix(rnorm(1000*3),1000,3)
beta <- diag(0.5, 3)
beta[2,1] <- beta[3,2] <- beta[1,3] <- 0.25
t2 <- t1 %*% t(beta) +  matrix(rnorm(1000*3),1000,3)
df <- as.data.frame(cbind(t1,t2))
names(df) <- c(paste0("T1_V",1:3), paste0("T2_V",1:3))

# Split data in two groups:
df1 <- df[1:500,]
df2 <- df[501:1000,]

# Fit CLPN models in bootnet:
library("bootnet")
Network_CL_1 <- estimateNetwork(df1, fun = CLPN.fun, labels = paste0("V",1:3), directed = TRUE)
Network_CL_2 <- estimateNetwork(df2, fun = CLPN.fun, labels = paste0("V",1:3), directed = TRUE)

# Compare to true network:
layout(t(1:3))
qgraph::qgraph(t(beta), layout = "circle", theme = "colorblind", title = "True CLPN", diag = TRUE)
plot(Network_CL_1, layout = "circle", theme = "colorblind", title = "Estimated CLPN group 1", diag = TRUE)
plot(Network_CL_2, layout = "circle", theme = "colorblind", title = "Estimated CLPN group 1", diag = TRUE)

# Perform NCT:
library("NetworkComparisonTest")
NCTres <- NCT(Network_CL_1, Network_CL_2)

# Inspect results:
NCTres

Best, Sacha

norvegicus7 commented 11 months ago

Great, thanks for the work-around. You mention that a minor change to the code for the NCT could make the test more applicable to directed networks, is this a planned update? I will be applying this test to some data in lack of other alternatives so such a change would be very welcome.

norvegicus7 commented 4 months ago

Hi again, @SachaEpskamp

I have finally been able to apply this to some real data, although I am encountering some difficulties. I am using a version of the CLPN code adapted for binary data instead of continous data, the code works for estimating the network models in themselves (including outputing graphics, running centrality indices etc.) but produces an error when used in the NCT function. Unfortunately, I cannot share the data itself currently. If it is crucial to have access to data I can try to find a way. I post the R code and resulting output below as well as the structure of the dataset. The two cross-lagged datasets contain participant problem gambling symptom report (presence/absence) from large general pop samples so there is a vast majority of 0s in both datasets (as can be seen below). Perhaps the low number of 1s are causing the difficulties?

CLPN.fun <- function(df, verbose = FALSE) {
  library(glmnet)
  library(qgraph)
  library(bootnet)
  library(NetworkToolbox)

  k <- ncol(df) / 2
  adjMat <- matrix(0, k, k)
  for (i in 1:k){
    lassoreg <- cv.glmnet(as.matrix(df[,1:k]), df[,(k+i)], 
                          family = "binomial", alpha = 1, standardize=TRUE, nfolds = 10)
    lambda <- lassoreg$lambda.1se 
    adjMat[1:k,i] <- coef(lassoreg, s = lambda, exact = FALSE)[2:(k+1)]
  }
  return(adjMat)
}

labels <- c("1", "2", "3", "4", "5", "6", "7", "8", "9")
> long13 <- as.matrix(longi13_15)
> set.seed(123)
> pg13_15CL <- estimateNetwork(long13, fun = CLPN.fun, labels = labels, directed = T) 

> long19 <- as.matrix(longi19_22)
> set.seed(123)
> pg19_22CL <- estimateNetwork(long19, fun = CLPN.fun, labels = labels, directed = T) 

> # Network Comparison Tests 2013-15 cross-lagged versus 2019-22 cross-lagged
> set.seed(NULL)
> pg13and15vs19and22 <- NCT(pg13_15CL, pg19_22CL, it = 10)
Note: estimateNetwork object used - estimation method has possibly not been validated.
  |                                                                                                |   0%Error in diffedges.perm[i, ] <- abs(r1perm - r2perm)[upper.tri(abs(r1perm -  : 
  number of items to replace is not a multiple of replacement length
In addition: Warning message:
In matrix(diffedges.real, it, nedges, byrow = TRUE) :
  data length [36] is not a sub-multiple or multiple of the number of rows [10]

Data structure:
> str(longi13_15)
tibble [2,749 x 18] (S3: tbl_df/tbl/data.frame)
 $ T1_LossControl     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_Tolerance       : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_Chase           : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_Borrowed        : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_FeltProblem     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_HealthProblem   : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_Criticized      : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_FinancialProblem: num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_FeltGuilty      : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_LossControl     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Tolerance       : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Chase           : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Borrowed        : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_FeltProblem     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_HealthProblem   : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Criticized      : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_FinancialProblem: num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_FeltGuilty      : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...

> str(longi19_22)
tibble [2,749 x 18] (S3: tbl_df/tbl/data.frame)
 $ T1_LossControl     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_Tolerance       : num [1:2749] 0 0 0 0 0 0 1 0 0 0 ...
 $ T1_Chase           : num [1:2749] 0 0 0 0 0 0 1 0 0 0 ...
 $ T1_Borrowed        : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_FeltProblem     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_HealthProblem   : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_Criticized      : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_FinancialProblem: num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T1_FeltGuilty      : num [1:2749] 0 0 0 0 1 0 0 0 0 0 ...
 $ T2_LossControl     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Tolerance       : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Chase           : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Borrowed        : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_FeltProblem     : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_HealthProblem   : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_Criticized      : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_FinancialProblem: num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...
 $ T2_FeltGuilty      : num [1:2749] 0 0 0 0 0 0 0 0 0 0 ...

Do you have any idea what might be causing the problem?