Python package for Causal Discovery by learning the graphical structure of Bayesian networks. Structure Learning, Parameter Learning, Inferences, Sampling methods.
Hi
I was surprised by the way the AUC was calculated in bnlearn when we try to estimate the performance of an estimated network using something like boot.strength.
If there are N nodes there are N^2-N possible edges and I was expecting that in the calculation of AUC all the edges would be included. However this is not the case. Is this something standard in the estimation of AUC in network inference problems? If that is the case I would love to see a reference.
For example, in your documentation you have the following example:
However, when we convert the outcome of boot.strength to a prediction object:
strength = boot.strength(alarm, R = 200, m = 30, algorithm = "hc")
pred = as.prediction(strength, true.dag)
the number of edges tested is a lot smaller:
pred@predictions[[1]] %>% length
666
I took a look at the bnlearn:::as.prediction.bn.strength function and it uses this subsets function to expand the true arcs but this does not include all possible edges.
dd = structure(data.frame(subsets(nodes, 2)), names = c("from", "to"))
I'm not quite sure what this function is doing.
If we estimate the performance using these predictions we get 0.842
performance(pred, "auc") %>% str
Formal class 'performance' [package "ROCR"] with 6 slots
..@ x.name : chr "None"
..@ y.name : chr "Area under the ROC curve"
..@ alpha.name : chr "none"
..@ x.values : list()
..@ y.values :List of 1
.. ..$ : num 0.842
..@ alpha.values: list()
However, using the minet package which uses the complete adjacency matrix we get a different value 0.87:
library(minet)
select=dplyr::select
## get estimated adjacency matrix
est_edges=strength %>% mutate(weights=strength*direction) %>% select(from,to,weights)
est_ig=igraph::graph_from_data_frame(est_edges)
est_adj= est_ig%>% igraph::get.adjacency(attr="weights") %>% as.matrix()
## get true adjacency matrix
true_adj=igraph::get.adjacency(as.igraph(true.dag)) %>% as.matrix
## make sure nodes are ordered the same way
est_adj=est_adj[colnames(true_adj),colnames(true_adj)]
compa=validate(est_adj,true_adj)
auc.roc(compa)
Hi I was surprised by the way the AUC was calculated in bnlearn when we try to estimate the performance of an estimated network using something like boot.strength.
If there are N nodes there are N^2-N possible edges and I was expecting that in the calculation of AUC all the edges would be included. However this is not the case. Is this something standard in the estimation of AUC in network inference problems? If that is the case I would love to see a reference.
For example, in your documentation you have the following example:
the number of nodes is
while the number of possible edges are
However, when we convert the outcome of boot.strength to a prediction object:
the number of edges tested is a lot smaller:
I took a look at the
bnlearn:::as.prediction.bn.strength
function and it uses this subsets function to expand the true arcs but this does not include all possible edges.dd = structure(data.frame(subsets(nodes, 2)), names = c("from", "to"))
I'm not quite sure what this function is doing.If we estimate the performance using these predictions we get 0.842
However, using the minet package which uses the complete adjacency matrix we get a different value 0.87:
0.871454862138092
Thanks for any insights on this FKG