Update the edges based on some kind of loss

erdogant / bnlearn

Python library for learning the graphical structure of Bayesian networks, parameter learning, inference and sampling methods.

https://erdogant.github.io/bnlearn

Other

463 stars 45 forks source link

Update the edges based on some kind of loss #70

Closed samanemami closed 1 year ago

samanemami commented 1 year ago

This is a question rather than an issue. I was wondering if it is possible to update the edges and dag in a Chow-Liu procedure! I could not find any related method if there is one.

Would you please help me with this question?

I appreciate it, Thank you so much in advance.

erdogant commented 1 year ago

Thansks for this interesting idea! Can you elaborate a bit more on this idea? Preferably with a small example, maybe a use case and which problem you aim to solve?

samanemami commented 1 year ago

@erdogant Thank you for your kind reply.

For instance, we estimate the structure with chow-Liu. And of course, these are not the ideal dependencies and edges between the variable, hence we need to optimize the edges and update the following estimated DAG.

erdogant commented 1 year ago

How do you want to optimize the edges? bnlearn does contain the independence_test to determine significance. Do you want do this iteratively? Maybe something like the following sequential steps?

Compute initial graph
Determine significant edges and remove those below certain alpha
Compute graph again but now with updated edges
Go step 2 until graph convergence

samanemami commented 1 year ago

Yes, exactly. I want to add a sequential step as you mentioned Is there any method for this matter?

erdogant commented 1 year ago

Ah great. Well it is not implemented but the most straightforward manner way is use a while loop.


# Load library
import bnlearn as bn
# Load example
df_raw = bn.import_example(data='titanic')
# Preprocessing raw dataset
dfhot, dfnum = bn.df2onehot(df_raw)

# start with empty blacklist
black_list = []
optimize = True

while optimize:

# Structure learning
model = bn.structure_learning.fit(dfnum, methodtype='cl', black_list= black_list, root_node='Survived', bw_list_method='nodes')

# Plot detected DAG
G = bn.plot(model)

# Compute edge strength using chi-square independence test and remove (prune) the not-signficant edges
model = bn.independence_test(model, dfnum, alpha=0.05, prune=True)

# Remove features from dfnum 
black_list.append(remfeat)

# if converged, set optimize =False

samanemami commented 1 year ago

Thank you so much @erdogant Great. Do you have any idea regarding the convergence?

erdogant commented 1 year ago

After consideration, I would first look at the model graph after pruning using the independence test (thus without the while loop). Another idea would be to look at the structure score to determine which method has best score (and thus best graph). Maybe Chow Liu is not the best choice (?)

samanemami commented 1 year ago

Thanks for the ideas @erdogant I used the independence test, but it wasn't promising! Considering another score would be interesting as well, I will try that.

What do you think about the following scenario;

Consider that we only need the idea of DAG. With this assumption; 1-We estimate the DAG using one of the approaches. 2-Fit Bayesian Network on the DAG. 3- Forward sampling with Bayesian. 4- Build the final (ideal) DAG on the estimated instances from the sampling.