erdogant / bnlearn

Python package for Causal Discovery by learning the graphical structure of Bayesian networks. Structure Learning, Parameter Learning, Inferences, Sampling methods.
https://erdogant.github.io/bnlearn
Other
482 stars 46 forks source link

plot method using improper weights for edges given independence tests? #93

Closed quantitative-ecologist closed 4 months ago

quantitative-ecologist commented 9 months ago

Hello,

Thank you for the bnlearn library for Python!

I have been playing with it for a couple of weeks and found some strange behaviour with the plot function that makes me question if it's a bug.

Here is an example :

1. I have learned the parameters of a DAG and then applied the "g_sq" independence test.

DAG = bn.independence_test(DAG, train_df, test = "g_sq", prune = False)

Printing the output, I get this : print(DAG["independence_test"])

source target stat_test p_value g_sq dof
prior_prey_speed_bin prior_success True 0.000000e+00 7095.359166 16
prior_prey_speed_bin hunting_type True 1.666661e-184 879.275521 8
prior_success hunting_type True 1.398179e-34 179.366911 8
xp_level hunting_type True 2.315951e-08 41.312300 4
xp_level chase_success_ratio_bin True 1.961613e-25 135.586098 8
hunting_type chase_success_ratio_bin True 1.278954e-26 141.290355 8
hunting_type prey_avg_speed_bin True 1.087990e-211 1005.269860 8
hunting_type hunting_success True 1.650949e-16 92.213758 8
time_lag_bin hunting_type True 1.054652e-10 63.280805 8
time_lag_bin chase_success_ratio_bin True 3.720814e-10 78.056073 16
chase_success_ratio_bin hunting_success True 0.000000e+00 28650.178197 16
prey_avg_speed_bin chase_success_ratio_bin True 0.000000e+00 7187.267068 16
prey_avg_speed_bin hunting_success True 0.000000e+00 7287.686944 16
game_duration_bin hunting_success True 0.000000e+00 9475.756745 16

2. Plotting the DAG

image

3. Problem

Given that the independence test value for this edge is smaller compared to some of the larger ones :

source target stat_test p_value g_sq dof
hunting_type hunting_success True 1.650949e-16 92.213758 8

then why is the edge returned on the plot much thicker (hunting type --> hunting success)?

Similarly, given that the g_sq test value is very large for this edge : source target stat_test p_value g_sq dof
game_duration_bin hunting_success True 0.000000e+00 9475.756745 16

then why is the edge weight is so small on the plot?

4. Attempted solution

Given my suspicions that the plot method does not use the proper weights, I coded my own plot using networkx and matplotlib. Here is my code. It seems that my solution uses the proper weights.

# Get edge properties from the DAG
edge_properties = bn.get_edge_properties(DAG)

# Create a directed graph
G = nx.DiGraph()

# Add edges with weights based on independence test results
for edge, properties in edge_properties.items():
    strength = properties.get("weight", 0)
    G.add_edge(*edge, weight = strength)

# Manually specify node names
node_mapping = {
    "hunting_type": "Hunting\ntype",
    "hunting_success": "Hunting\nsuccess",
    "prey_avg_speed_bin": "Prey speed",
    "prior_prey_speed_bin": "Prior\nprey speed",
    "prior_success": "Prior\nhunting\nsuccess",
    "chase_success_ratio_bin": "Chase\nsuccess\nratio",
    "time_lag_bin": "Time\nlag",
    "xp_level": "Experience\nlevel",
    "game_duration_bin": "Game\nduration"
}
G = nx.relabel_nodes(G, node_mapping)

# Define positions for the nodes
pos = {
    "Experience\nlevel": (0.5, 0),
    "Prior\nhunting\nsuccess": (0.8, 0.005),
    "Prior\nprey speed": (1.2, 0.005),
    "Time\nlag": (0.5, -0.0055),
    "Hunting\ntype": (1, 0),
    "Prey speed": (1.5, 0),
    "Chase\nsuccess\nratio": (1, -0.0055),
    "Hunting\nsuccess": (1.25, -0.01),
    "Game\nduration": (1, -0.012),
}

# Set a fixed larger size for nodes
node_size = 5000
node_sizes = [node_size for _ in range(len(G))]

# Define edge colors and transparency
M = G.number_of_edges()
edge_colors = [
    properties["weight"] for u, v,
    properties in G.edges(data = True)
]
edge_alphas = [(5 + i) / (M + 4) for i in range(M)]

# Use viridis colormap
cmap = plt.cm.viridis

# Plot size
plt.figure(figsize = (15, 10))

# Draw the DAG
nodes = nx.draw_networkx_nodes(
    G,
    pos,
    node_size = node_sizes,
    node_color = "indigo",
    edgecolors = "black"
)

edges = nx.draw_networkx_edges(
    G,
    pos,
    node_size = node_sizes,
    arrowstyle = "->",
    arrowsize = 20,
    edge_color = edge_colors,
    edge_cmap = cmap,
    width = 2,
)

# set alpha value for each edge
#for i in range(M):
#    edges[i].set_alpha(edge_alphas[i])

pc = mpl.collections.PatchCollection(edges, cmap = cmap)
pc.set_array(edge_colors)

# Display node labels
node_labels = nx.draw_networkx_labels(
    G, pos,
    font_size = 10,
    font_weight = "bold",
    font_color = "white"
)

# Display the figure
ax = plt.gca()
ax.set_axis_off()
plt.colorbar(pc, ax = ax, label = "Edge strength")
plt.show()

image

Comparing with the Titanic example

I tried to reproduce my problem using the exact same code provided for the Titanic example, and it seems like this strange behaviour also occurs. Here are my outputs :

import bnlearn as bn

# Load example mixed dataset
df = bn.import_example(data='titanic')

# Convert to onehot
dfhot, dfnum = bn.df2onehot(df)

# Structure learning
# model = bn.structure_learning.fit(dfnum, methodtype='cl', black_list=['Embarked','Parch','Name'], root_node='Survived', bw_list_method='nodes')
model = bn.structure_learning.fit(dfnum)
# Plot
G = bn.plot(model, interactive=False)

# Compute edge strength with the chi_square test statistic
model = bn.independence_test(model, dfnum, test='chi_square', prune=True)
# Plot
bn.plot(model, interactive=False, pos=G['pos'])

# Parameter learning
model = bn.parameter_learning.fit(model, dfnum)

image

As you can see, the edge between SibSp --> Parch should be much larger, given that the independence test results are :

source target stat_test p_value g_sq dof
Pclass Embarked True 1.519464e-25 129.654853 6
Pclass Survived True 4.549252e-23 102.888989 2
Sex Survived True 1.197357e-58 260.717020 1
Sex Pclass True 2.063886e-04 16.971499 2
SibSp Parch True 1.468030e-62 336.388883 15
Parch Sex True 1.013616e-12 58.892256 3

Any help would be greatly appreciated.

Thank you very much!

erdogant commented 9 months ago

Thank you for this clear explanation! Im going to dig into it.

erdogant commented 7 months ago

Thank you for your patience. In specific cases, the weights for the edges were indeed not updated. I fixed that issue now. Update to the latest version with pip install -U bnlearn

Can you confirm whether this also works for you?

erdogant commented 4 months ago

Closing this! Let me know if something is still off.