plot method using improper weights for edges given independence tests?

quantitative-ecologist commented 9 months ago

Hello,

Thank you for the bnlearn library for Python!

I have been playing with it for a couple of weeks and found some strange behaviour with the plot function that makes me question if it's a bug.

Here is an example :

1. I have learned the parameters of a DAG and then applied the "g_sq" independence test.

DAG = bn.independence_test(DAG, train_df, test = "g_sq", prune = False)

Printing the output, I get this : print(DAG["independence_test"])

source	target	stat_test	p_value	g_sq	dof
prior_prey_speed_bin	prior_success	True	0.000000e+00	7095.359166	16
prior_prey_speed_bin	hunting_type	True	1.666661e-184	879.275521	8
prior_success	hunting_type	True	1.398179e-34	179.366911	8
xp_level	hunting_type	True	2.315951e-08	41.312300	4
xp_level	chase_success_ratio_bin	True	1.961613e-25	135.586098	8
hunting_type	chase_success_ratio_bin	True	1.278954e-26	141.290355	8
hunting_type	prey_avg_speed_bin	True	1.087990e-211	1005.269860	8
hunting_type	hunting_success	True	1.650949e-16	92.213758	8
time_lag_bin	hunting_type	True	1.054652e-10	63.280805	8
time_lag_bin	chase_success_ratio_bin	True	3.720814e-10	78.056073	16
chase_success_ratio_bin	hunting_success	True	0.000000e+00	28650.178197	16
prey_avg_speed_bin	chase_success_ratio_bin	True	0.000000e+00	7187.267068	16
prey_avg_speed_bin	hunting_success	True	0.000000e+00	7287.686944	16
game_duration_bin	hunting_success	True	0.000000e+00	9475.756745	16

2. Plotting the DAG

3. Problem

Given that the independence test value for this edge is smaller compared to some of the larger ones :

source	target	stat_test	p_value	g_sq	dof
hunting_type	hunting_success	True	1.650949e-16	92.213758	8

then why is the edge returned on the plot much thicker (hunting type --> hunting success)?

Similarly, given that the g_sq test value is very large for this edge :	source	target	stat_test	p_value	g_sq	dof
game_duration_bin	hunting_success	True	0.000000e+00	9475.756745	16

then why is the edge weight is so small on the plot?

4. Attempted solution

Given my suspicions that the plot method does not use the proper weights, I coded my own plot using networkx and matplotlib. Here is my code. It seems that my solution uses the proper weights.

# Get edge properties from the DAG
edge_properties = bn.get_edge_properties(DAG)

# Create a directed graph
G = nx.DiGraph()

# Add edges with weights based on independence test results
for edge, properties in edge_properties.items():
    strength = properties.get("weight", 0)
    G.add_edge(*edge, weight = strength)

# Manually specify node names
node_mapping = {
    "hunting_type": "Hunting\ntype",
    "hunting_success": "Hunting\nsuccess",
    "prey_avg_speed_bin": "Prey speed",
    "prior_prey_speed_bin": "Prior\nprey speed",
    "prior_success": "Prior\nhunting\nsuccess",
    "chase_success_ratio_bin": "Chase\nsuccess\nratio",
    "time_lag_bin": "Time\nlag",
    "xp_level": "Experience\nlevel",
    "game_duration_bin": "Game\nduration"
}
G = nx.relabel_nodes(G, node_mapping)

# Define positions for the nodes
pos = {
    "Experience\nlevel": (0.5, 0),
    "Prior\nhunting\nsuccess": (0.8, 0.005),
    "Prior\nprey speed": (1.2, 0.005),
    "Time\nlag": (0.5, -0.0055),
    "Hunting\ntype": (1, 0),
    "Prey speed": (1.5, 0),
    "Chase\nsuccess\nratio": (1, -0.0055),
    "Hunting\nsuccess": (1.25, -0.01),
    "Game\nduration": (1, -0.012),
}

# Set a fixed larger size for nodes
node_size = 5000
node_sizes = [node_size for _ in range(len(G))]

# Define edge colors and transparency
M = G.number_of_edges()
edge_colors = [
    properties["weight"] for u, v,
    properties in G.edges(data = True)
]
edge_alphas = [(5 + i) / (M + 4) for i in range(M)]

# Use viridis colormap
cmap = plt.cm.viridis

# Plot size
plt.figure(figsize = (15, 10))

# Draw the DAG
nodes = nx.draw_networkx_nodes(
    G,
    pos,
    node_size = node_sizes,
    node_color = "indigo",
    edgecolors = "black"
)

edges = nx.draw_networkx_edges(
    G,
    pos,
    node_size = node_sizes,
    arrowstyle = "->",
    arrowsize = 20,
    edge_color = edge_colors,
    edge_cmap = cmap,
    width = 2,
)

# set alpha value for each edge
#for i in range(M):
#    edges[i].set_alpha(edge_alphas[i])

pc = mpl.collections.PatchCollection(edges, cmap = cmap)
pc.set_array(edge_colors)

# Display node labels
node_labels = nx.draw_networkx_labels(
    G, pos,
    font_size = 10,
    font_weight = "bold",
    font_color = "white"
)

# Display the figure
ax = plt.gca()
ax.set_axis_off()
plt.colorbar(pc, ax = ax, label = "Edge strength")
plt.show()

Comparing with the Titanic example

I tried to reproduce my problem using the exact same code provided for the Titanic example, and it seems like this strange behaviour also occurs. Here are my outputs :

import bnlearn as bn

# Load example mixed dataset
df = bn.import_example(data='titanic')

# Convert to onehot
dfhot, dfnum = bn.df2onehot(df)

# Structure learning
# model = bn.structure_learning.fit(dfnum, methodtype='cl', black_list=['Embarked','Parch','Name'], root_node='Survived', bw_list_method='nodes')
model = bn.structure_learning.fit(dfnum)
# Plot
G = bn.plot(model, interactive=False)

# Compute edge strength with the chi_square test statistic
model = bn.independence_test(model, dfnum, test='chi_square', prune=True)
# Plot
bn.plot(model, interactive=False, pos=G['pos'])

# Parameter learning
model = bn.parameter_learning.fit(model, dfnum)

As you can see, the edge between SibSp --> Parch should be much larger, given that the independence test results are :

source	target	stat_test	p_value	g_sq	dof
Pclass	Embarked	True	1.519464e-25	129.654853	6
Pclass	Survived	True	4.549252e-23	102.888989	2
Sex	Survived	True	1.197357e-58	260.717020	1
Sex	Pclass	True	2.063886e-04	16.971499	2
SibSp	Parch	True	1.468030e-62	336.388883	15
Parch	Sex	True	1.013616e-12	58.892256	3

Any help would be greatly appreciated.

Thank you very much!

erdogant commented 9 months ago

Thank you for this clear explanation! Im going to dig into it.

erdogant commented 7 months ago

Thank you for your patience. In specific cases, the weights for the edges were indeed not updated. I fixed that issue now. Update to the latest version with pip install -U bnlearn

Can you confirm whether this also works for you?

erdogant commented 4 months ago

Closing this! Let me know if something is still off.

erdogant / bnlearn