baikal is written in pure Python. It supports Python 3.5 and above.
Note: baikal is still a young project and there might be backward incompatible changes. The next development steps and backwards-incompatible changes are announced and discussed in this issue. Please subscribe to it if you use baikal.
baikal is a graph-based, functional API for building complex machine learning pipelines of objects that implement the scikit-learn API. It is mostly inspired on the excellent Keras API for Deep Learning, and borrows a few concepts from the TensorFlow framework and the (perhaps lesser known) graphkit package.
baikal aims to provide an API that allows to build complex, non-linear machine learning pipelines that look like this:
with code that looks like this:
x1 = Input()
x2 = Input()
y_t = Input()
y1 = ExtraTreesClassifier()(x1, y_t)
y2 = RandomForestClassifier()(x2, y_t)
z = PowerTransformer()(x2)
z = PCA()(z)
y3 = LogisticRegression()(z, y_t)
ensemble_features = Stack()([y1, y2, y3])
y = SVC()(ensemble_features, y_t)
model = Model([x1, x2], y, y_t)
With baikal you can
All with boilerplate-free, readable code.
The pipeline above (to the best of the author's knowledge) cannot be easily built using scikit-learn's composite estimators API as you encounter some limitations:
Pipeline
s, ColumnTransformer
s, and StackingClassifier
s, etc), but you might
end up with code that feels hard-to-follow and verbose. Perhaps you could instead define a big, composite estimator class that integrates each of the pipeline steps through composition. This, however, most likely will require
__init__
methods to control each of the internal steps' knobs; get_params
and set_params
if you want to use, say, GridSearchCV
; By using baikal as shown in the example above, code can be more readable, less verbose and closer to our mental representation of the pipeline. baikal also provides an API to fit, predict with, and query the entire pipeline with single commands.