Open bdatko opened 5 months ago
Hi, that sounds like a good idea. In pyAgrum they call the useMIIC
function on a learner object (link) and link, but it's not totally clear how to pass arguments to the algorithm, like choosing score or test function. Do you have some sample usage?
MIIC also seems to be implemented here. Do you know which one to prefer?
@felixleopoldo The useMIIC
is the their lower-level API, but there is a convenience class pyAgrum.skbn.BNClassifier
where the default choice of learningMethod
is MIIC
. The other choice for learningMethod
are: Chow-Liu, NaiveBayes, Tree-augmented NaiveBayes, MIIC + (MDL or NML), Greedy Hill Climb, Tabu. You can use scoringType
within the initializer of pyAgrum.skbn.BNClassifier
to pick your flavor: AIC, BIC, BD, BDeu, K2, Log2.
There are examples of using pyAgrum.skbn.BNClassifier
within this notebook titled Learning classifiers, shown below is a call using MIIC (cell 7 from the linked notebook):
#we use now another method to learn the BN (MIIC)
BNTest= skbn.BNClassifier(learningMethod = 'MIIC', prior= 'Smoothing', priorWeight = 0.5,
discretizationStrategy = 'quantile', usePR = True, significant_digit = 13)
xTrain, yTrain = BNTest.XYfromCSV(filename = 'res/creditCardTest.csv', target = 'Class')
More examples using BNClassifier
can be found in the notebook titled Comparing classifiers (including Bayesian networks) with scikit-learn.
I have only used pyAgrum because I don't know R so, I have never directly compared the two. pyAgrum is a Python wrapper around the aGrum C++ library where their MIIC implementation is sourced in C++. It looks similar to how the original authors of MIIC provide a C++ implementation wrapped in R, but I don't know for sure.
Let me know if you need any more help. =)
Thanks. It seems like they refer to the Bayesian network as a classifier, where one is specified as Target? It would be nice if you could show how to do the following two steps:
- Learn the graph of a Bayesian network from a CSV data file (in the Benchpress data format) using with relevant parameters for structure learning
I hope the example below demos what you need.
- Write the adjacency matrix representation of the graph to a CSV file following Benchpress graph format
From what I know, there isn't any convenient writer
to save the adjacency matrix to CSV so, shown below is a small helper to save the matrix in the format for benchpress
.
The example assumes you have the following installed in your environment: pyAgrum
, pandas
, scikit-learn
. You will need all three to run the example below.
import csv
from pathlib import Path
import pandas as pd
import pyAgrum.skbn as skbn
from pyAgrum import BayesNet
def adjacency_to_csv(bn: BayesNet, *, to_file: str):
id_to_name = {bn.idFromName(name): name for name in bn.names()}
with Path(to_file).open(mode="w", encoding="utf-8") as csvfile:
writer = csv.writer(csvfile)
# write header
writer.writerow(id_to_name[col_id] for col_id in range(bn.size()))
#write rows
adj_mat = bn.adjacencyMatrix()
writer.writerows(row for row in adj_mat)
data = pd.read_csv(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv"
).dropna()
data.to_csv("fully_obs_titanic.csv", index=False)
classifier = skbn.BNClassifier(learningMethod="MIIC", scoringType="BIC")
xdata, ydata = classifier.XYfromCSV(filename="fully_obs_titanic.csv", target="survived")
classifier.fit(xdata, ydata)
adjacency_to_csv(classifier.bn, to_file="resulting_adjacency.csv")
Here is the resulting adjacency matrix:
❯ cat resulting_adjacency.csv
survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,1,0,0,0,0,1,0,0,0,0,0,0,0,0
0,0,1,1,0,0,0,0,0,0,1,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,1,0
0,1,0,0,0,0,0,0,1,0,0,0,0,0,0
0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,1,1,0,0,0,1,0,0,0,0,0
I ran this example with the following environment:
Python 3.11.7
numpy 1.26.4
pandas 2.2.2
pyAgrum 1.14.0
scikit-learn 1.5.0
scipy 1.13.1
Thanks a lot. So for the target variable (survived), can we just choose the first one in the order?
For the fit
method of BNClassifier
you can specify any column within the CSV file, see here. Shown below is the snippet for the target
Fits the model to the training data provided. The two possible uses of this function are fit(X,y) and fit(data=…, targetName=…). Any other combination will raise a ValueError
targetName
(str
) – specifies the name of the targetVariable in the csv file. Warning: Raises ValueError if either X or y is not None. Raises ValueError if data is None.
Ok!
Hi @felixleopoldo , many thanks to @bdatko for this "issue".
Actually, BNClassifier is based on the BNLearner class. If you want to test the learning algorithms of pyAgrum, you should use BNLearner. MIIC is a "constraint-based" method based on mutual information. There is no score but one can apply corrections (MDL/NML). Of course, you can add some priors for the parameters approximation.
import pyAgrum as gum
learner=gum.BNLearner("test.csv") # MIIC is used as default (some score-based are also implented)
learner.useMDLCorrection() # for small dataset
learner.useSmoothingPrior() # smoothing (default weight=1) for parameters
bn=learner.learnBN() # learning
Thanks again to @bdatko. Please tell me if you need some other snippets :-)
Hi @phwuil, thanks for the snippet. Could you show how MIIC could be run on continuous data too?
hi @felixleopoldo , thank you for that. pyAgrum is mainly about discrete variables. However there are 2 solutions for continuous data : 1- automatic discretization 2- CLG (experimental python model)
1- automatic discretisation with pyAgrum.skbn.BNDiscretizer
import pyAgrum as gum
import pyAgrum.skbn as skbn
filename="test.csv"
# BNDiscretizer has many options
disc=skbn.BNDiscretizer()
template=disc.discretizedBN(filename)
# template contains all the (discrete variables)
# that will be used for the learning
learner=gum.BNLearner(filename,template)
learner.useMDLCorrection()
learner.useSmoothingPrior()
bn=learner.learnBN()
2- CLG : new CLG implementation in pyAgrum 1.14.0 pyAgrum.CLG tutorial
import pyAgrum.clg as gclg
# no hybrid learning : pure clg data
learner = clg.CLGLearner(filename)
clg = learner.learnCLG()
OK. There is a new pyagrum branch, where you can try pyagrum by
snakemake --cores all --use-singularity --configfile workflow/rules/structure_learning_algorithms/pyagrum/pyagrum.json --rerun-incomplete
If you know any data scenario where it performs well, let me know!
Hi @felixleopoldo, thank you for this. I have to admit that I did not know before it was pointed out to me by @bdatko. Thanks for both of you. So I will have to learn how to use it. :-) (if you have THE good ref to help, please tell me :-) !)
I see, no worries:) If you mean the main reference to Benchpress it is here. It is not mentioned there, but you can also run it under WSL on Windows.
I think pyAgrum would be a great addition to the list of algorithms. To my eyes, it did not look like there was a comparison in
benchpress
using the Multivariate Information-based Inductive Causation (MIIC) algorithm which pyAgrum has implemented. The library also offer a scikit-learn interface to learn classifiers which should help with the integration intobenchpress
.