aimclub / BAMT

Repository of a data modeling and analysis tool based on Bayesian networks
https://bamt.readthedocs.io
BSD 3-Clause "New" or "Revised" License
117 stars 17 forks source link

BAMT 2.0.0 - new features, refactoring, architecture refreshment #108

Open jrzkaminski opened 3 months ago

jrzkaminski commented 3 months ago

Current BAMT architecture has a number of disadvantages, some clunky code and other limitations. Thus, it was decided to make a full refactoring. This refreshment will not only include new refactored code and API but also new features (like vectorized sampling and other operations, new algorithms for structure learning, score-functions etc.) For now, here is a checklist of modules that should be implemented in 2.0.0 architecture:

The development of BAMT 2.0.0 is held in 2.0.0 branch of the repository. If you, the reader of the issue, have decided to implement some module or submodule, please reply to this message, create a separate issue and add it to milestone and project.

The goal of these changes is also to make a sklearn-like interface, so the usual pipeline looks like that:

# read data
data = pd.read_csv("data.csv")

# define optimizers and score functions
dag_score_function = DAGScoreFunction(**parameters)
dag_optimizer = DAG_optimizer(**parameters)

# get a structure, maybe in networkx format?
G = dag_optimizer.optimize(data, ** parameters)

# define parameters estimator and BN
parameters_estimator = ParametersEstimator(**parameters)
bn = ContinuousBayesianNetwork(**parameters)

# fit the bn
bn.fit(data, ParametersEstimator, **parameters)
bn.sample(1000)
bn.predict(data.drop[["col1", "col2"]])
jrzkaminski commented 1 month ago

distribution module has been implemented