erdogant / bnlearn

Python package for Causal Discovery by learning the graphical structure of Bayesian networks. Structure Learning, Parameter Learning, Inferences, Sampling methods.
https://erdogant.github.io/bnlearn
Other
476 stars 46 forks source link

Predict #33

Closed erdogant closed 3 years ago

erdogant commented 3 years ago

Oooh great! Thank you @erdogant ! I have another question, when you use the predict function, you obtain a dataframe with prediction and probabilities according to your variables which you add in the function predict in your example are rain and cloudy. The probability is the probability of Wet_grass, and the predictions is the prediction of wet_grass for cloudy using cloudy as a variable for example?

The last questions are that the library of R which was using the last week incorporates a cross validation function obtaining the accuracy with a KFold. I was doing manually with KFold().split but it was very slow and the same for likehood option. Also, I have been looking for oede naive bayes and I found it in bnclassify (in R) library for it was interesting for you.

Thanks for all! I love your library!

Pablo

Originally posted by @PARODBE in https://github.com/erdogant/bnlearn/issues/32#issuecomment-908060695

erdogant commented 3 years ago

First of all thanks!

I'm not sure whether I understand the question but let me explain the predict function. Suppose you have the Asia dataset and you learn a Bayesian model:

df = bn.import_example('asia')
edges = [('smoke', 'lung'),
         ('smoke', 'bronc'),
         ('lung', 'xray'),
         ('bronc', 'xray')]

# Make the actual Bayesian DAG
DAG = bn.make_DAG(edges, verbose=0)
model = bn.parameter_learning.fit(DAG, df, verbose=3)

# Generate some data based on DAG
df = bn.sampling(model, n=1000)

At this point, we have a bayesian model and a data frame df. We can make predictions on the entire data frame using the model with the predict function.

The data frame looks as following:

     smoke  bronc  lung  xray
0        0      1     1     1
1        1      1     1     0
2        1      1     1     1
3        0      1     1     1
4        1      1     1     1
..     ...    ...   ...   ...
995      0      0     0     0
996      1      1     1     1
997      1      1     1     1
998      0      0     1     1
999      0      0     1     0

Suppose we want to predict the outcome for the variables bronc and xray

# Make predictions
Pout = bn.predict(model, df, variables=['bronc','xray'])

For each element in the data frame df, the probability is computed based on the known status of bronc and xray The status of bronc and xray are thus the same between Pout and df. What's new is the Probability P for this status.

     bronc  xray         p
0        0     1  0.550458
1        1     1  0.628524
2        1     1  0.628524
3        0     1  0.550458
4        1     1  0.628524
..     ...   ...       ...
995      0     0  0.457963
996      1     1  0.628524
997      1     1  0.628524
998      0     1  0.550458
999      0     1  0.550458

[1000 rows x 3 columns]
PARODBE commented 3 years ago

I was trying your new function for tree augmented naive bayes and It works very well! Congrats! I would want receive a advice from your side, if you agree of course. For an optimal visualization when you have more of 20 variables what would be the optimal way for plot these kind of Gaussian Networks as it is difficult to see the connections...? And do you considera that better plot or adyajency matrix?

Thanks @erdogant !

erdogant commented 3 years ago

Nice to hear. I agree with the plotting and added interactive plotting now.

pip install -U bnlearn

import bnlearn as bn
df = bn.import_example()

# Structure learning
model = bn.structure_learning.fit(df)

# Make plot
bn.plot(model, interactive=True)

# Add some parameters for the interactive plot
bn.plot(model, interactive=True, params = {'height':'600px'})

# Add more parameters for the interactive plot
bn.plot(model, interactive=True, params = {'directed':True, 'height':'800px', 'width':'70%', 'notebook':False, 'heading':title, 'layout':None, 'font_color': False, 'bgcolor':'#ffffff'})

I also added a section in the documentation pages: https://erdogant.github.io/bnlearn/pages/html/interactive%20plotting.html

PARODBE commented 3 years ago

Uoooo it looks very nice! Tomorrow, it will be my first test!! Thank you!!!

PARODBE commented 3 years ago

I have tested the interactive option and it's so so so nice!!!!!!!! I am very excited!!!! very useful!!! Thank you!!

erdogant commented 3 years ago

Great to hear! Have fun!