firefly-cpp / NiaAML

Python automated machine learning framework.
MIT License
32 stars 12 forks source link
automl classification classification-pipelines framework machine-learning nature-inspired-algorithms python

NiaAML

๐ŸŒณ NiaAML

PyPI Version PyPI - Python Version PyPI - Downloads Packaging status Downloads GitHub license build Coverage Status Documentation Status

GitHub commit activity Average time to resolve an issue Percentage of issues still open GitHub contributors

DOI DOI

๐Ÿ“ฆ Installation โ€ข ๐Ÿ’ป Graphical User Interface โ€ข ๐Ÿง‘โ€๐Ÿ’ป Command Line Interface โ€ข ๐Ÿ“ฎ API โ€ข โœจ Implemented Components โ€ข ๐Ÿ’ช Optimization Process And Parameter Tuning โ€ข ๐Ÿ““ Examples โ€ข ๐Ÿซ‚ Contributors โ€ข ๐Ÿ™ Support โ€ข ๐Ÿ”‘ License โ€ข ๐Ÿ“„ Cite Us

NiaAML is a framework for Automated Machine Learning based on nature-inspired algorithms for optimization. The framework is written fully in Python. The name NiaAML comes from the Automated Machine Learning method of the same name [1]. Its goal is to compose the best possible classification pipeline for the given task efficiently using components on the input. The components are divided into three groups: feature selection algorithms, feature transformation algorithms and classifiers. The framework uses nature-inspired algorithms for optimization to choose the best set of components for the classification pipeline, and optimize their hyperparameters. We use the NiaPy framework for the optimization process, which is a popular Python collection of nature-inspired algorithms. The NiaAML framework is easy to use and customize or expand to suit your needs.

๐Ÿ†•๐Ÿ“ˆ NiaAML now also support regression tasks. The package still refers to regressors as "classifiers" to avoid introducing a breaking change to the API.

The NiaAML framework allows you not only to run full pipeline optimization, but also to separate implemented components such as classifiers, feature selection algorithms, etc. It supports numerical and categorical features as well as missing values in datasets.

NiaAML Architecture


๐Ÿ“ฆ Installation

pip3

Install NiaAML with pip3:

pip3 install niaaml

In case you would like to try out the latest pre-release version of the framework, install it using:

pip3 install niaaml --pre

Fedora Linux

To install NiaAML on Fedora, use:

$ dnf install python-niaaml

Alpine Linux

To install NiaAML on Alpine Linux, please enable Community repository and use:

$ apk add py3-niaaml

Arch Linux

To install NiaAML on Arch Linux, use:

$ yay -Syyu python-niaaml

Nix

To install NiaAML with the Nix package manager, use:

$ nix-env -i python311Packages.niaaml

To enter a shell with the package already installed, use:

$ nix-shell -p python311Packages.niaaml

๐Ÿ’ป Graphical User Interface

There is a simple Graphical User Interface for the NiaAML package available here.

๐Ÿง‘โ€๐Ÿ’ป Command Line Interface

We also provide a CLI for quick pipeline optimizations and inference from the terminal without the need to write custom scripts.

When you install the package as instructed above, you will already have access to the niaaml command with sub-commands optimize and infer

For usage information, add the --help flag:

niaaml help

niaaml infer help

An example Invocation of optimize:

niaaml optimize example

๐Ÿ“ฎ API

There is a simple API for remote work with NiaAML package available here.

โœจ Implemented Components

Click here for a list of currently implemented components divided into groups: classifiers, feature selection algorithms and feature transformation algorithms. At the end you can also see a list of currently implemented fitness functions for the optimization process, categorical features' encoders, and missing values' imputers. All of the components are passed into the optimization process using their class names. Let's say we want to choose between Adaptive Boosting, Bagging and Multi Layer Perceptron classifiers, Select K Best and Select Percentile feature selection algorithms and Normalizer as the feature transformation algorithm (may not be selected during the optimization process).

PipelineOptimizer(
    data=...,
    classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron'],
    feature_selection_algorithms=['SelectKBest', 'SelectPercentile'],
    feature_transform_algorithms=['Normalizer']
)

The argument of the PipelineOptimizer categorical_features_encoder is None by default. If your dataset contains any categorical features, you need to specify an encoder to use. The same goes for imputer and features that contain missing values.

PipelineOptimizer(
    data=...,
    classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron'],
    feature_selection_algorithms=['SelectKBest', 'SelectPercentile'],
    feature_transform_algorithms=['Normalizer'],
    categorical_features_encoder='OneHotEncoder',
    imputer='SimpleImputer'
)

For a full example see the ๐Ÿ““ Examples section.

๐Ÿ’ช Optimization Process And Parameter Tuning

In the modifier version of NiaAML optimization process there are two types of optimization. The goal of the first type is to find an optimal set of components (feature selection algorithm, feature transformation algorithm and classifier). The next step is to find optimal parameters for the selected set of components, and that is the goal of the second type of optimization. Each component has an attribute _params, which is a dictionary of parameters and their possible values.

self._params = dict(
    n_estimators = ParameterDefinition(MinMax(min=10, max=111), np.uint),
    algorithm = ParameterDefinition(['SAMME', 'SAMME.R'])
)

An individual in the first type of optimization is represented as a real-valued vector that has a size equal to the sum of the number of keys in all three dictionaries (classifier's _params, Feature Transformation algorithm's _params and feature selection algorithm's _params) and the value of each dimension is in the range [0.0, 1.0]. The second type of optimization maps real values from the individual's vector to those parameter definitions in the dictionaries. Each parameter's value can be defined as a range or array of values. In the first case, a value from a vector is mapped from one iterval to another, and in the second case, a value from the vector falls into one of the bins that represent an index of the array that holds possible parameters` values.

Let's say we have a classifier with 3 parameters, a feature selection algorithm with 2 parameters and feature transformation algorithm with 4 parameters. The size of an individual in the second type of optimization is 9. The size of an individual in the first type of optimization is always 3 (1 classifier, 1 feature selection algorithm and 1 feature transformation algorithm).

In some cases we may want to tune a parameter that needs additional information for setting its range of values, so we cannot set the range in the initialization method. In that case, we should set its value in the dictionary to None and define it later in the process. The parameter will be a part of the parameter tuning process as soon as we define its possible values. For example, see Select K Best Feature Selection and its parameter k.

The NiaAML framwork also supports running optimization according to the original method proposed in [1] where the components selection and hyperparameter optimization steps are combined into one.

๐Ÿ““ Examples

Example of Usage

Load data and try to find the optimal pipeline for the given components. The example below uses the Particle Swarm Algorithm as the optimization algorithm. You can find a list of all available algorithms in the NiaPy's repository.

from niaaml import PipelineOptimizer, Pipeline
from niaaml.data import BasicDataReader
import numpy
import pandas

# dummy random data
data_reader = BasicDataReader(
    x=numpy.random.uniform(low=0.0, high=15.0, size=(50, 3)),
    y=numpy.random.choice(['Class 1', 'Class 2'], size=50)
)

pipeline_optimizer = PipelineOptimizer(
    data=data_reader,
    classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron', 'RandomForest', 'ExtremelyRandomizedTrees', 'LinearSVC'],
    feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
    feature_transform_algorithms=['Normalizer', 'StandardScaler']
)

# run the modified version of optimization
pipeline1 = pipeline_optimizer.run('Accuracy', 15, 15, 300, 300, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')

# run the original version
pipeline2 = pipeline_optimizer.run_v1('Accuracy', 15, 400, 'ParticleSwarmAlgorithm')

You can save a result of the optimization process as an object to a file for later use.

pipeline1.export('pipeline.ppln')

And also load it from a file and use the pipeline.

loaded_pipeline = Pipeline.load('pipeline.ppln')

# some features (can be loaded using DataReader object instances)
x = pandas.DataFrame([[0.35, 0.46, 5.32], [0.16, 0.55, 12.5]])
y = loaded_pipeline.run(x)

You can also save a user-friendly representation of a pipeline to a text file.

pipeline1.export_text('pipeline.txt')

This is a very simple example with dummy data. It is only intended to give you a basic idea of how to use the framework.

๐Ÿ“ˆ Example of a Regression Task

The API for solving regression tasks is not different to the classification use-case. One only has to choose the right components that support regression:

Currently, the following components support regression tasks:

โžก๏ธ Feature Transform Algorithms:

๐Ÿ”Ž Feature Selection Algorithms:

๐Ÿ”ฎ Models (Classifiers):

pipeline_optimizer = PipelineOptimizer(
    data=data_reader,
    feature_selection_algorithms=["SelectKBest", "SelectPercentile", "SelectUnivariateRegression"],
    feature_transform_algorithms=["Normalizer", "StandardScaler"],
    classifiers=["LinearRegression", "RidgeRegression", "LassoRegression", "DecisionTreeRegression", "GaussianProcessRegression"],
)

# run the modified version of optimization
pipeline1 = pipeline_optimizer.run("MSE", 10, 10, 20, 20, "ParticleSwarmAlgorithm")

Example of a Pipeline Component's Implementation

The NiaAML framework is easily expandable, as you can implement components by overriding the base classes' methods. To implement a classifier you should inherit from the Classifier class, and you can do the same with FeatureSelectionAlgorithm and FeatureTransformAlgorithm classes. All of the mentioned classes inherit from the PipelineComponent class.

Take a look at the Classifier class and the implementation of the AdaBoost classifier that inherits from it.

Example of a Fitness Function's Implementation

The NiaAML framework also allows you to implement your own fitness function. All you need to do is implement the FitnessFunction class.

Take a look at the Accuracy implementation.

Example of a Feature Encoder's Implementation

The NiaAML framework also allows you to implement your own feature encoder. All you need to do is implement the FeatureEncoder class.

Take a look at the OneHotEncoder implementation.

Example of an Imputer's Implementation

The NiaAML framework also allows you to implement your own imputer. All you need to do is implement the Imputer class.

Take a look at the SimpleImputer implementation.

More

You can find more examples here.

๐Ÿซ‚ Contributors

Thanks goes to these wonderful people (emoji key):


Luka Peฤnik

๐Ÿ’ป ๐Ÿ“– ๐Ÿ‘€ ๐Ÿ› ๐Ÿ’ก โš ๏ธ ๐Ÿš‡

firefly-cpp

๐Ÿ’ป ๐Ÿ› ๐Ÿง‘โ€๐Ÿซ ๐Ÿ”ฌ ๐Ÿค”

sisco0

๐Ÿค”

zStupan

๐Ÿ’ป

Ben Beasley

๐Ÿ’ป ๐Ÿš‡

Laurenz Farthofer

๐Ÿ’ป ๐Ÿ“– ๐Ÿš‡

This project follows the all-contributors specification. Contributions of any kind are welcome!

๐Ÿ™‡ Contributing

We encourage you to contribute to NiaAML! Please check out the Contributing to NiaAML guide for guidelines about how to proceed.

Everyone interacting in NiaAML's codebases, issue trackers, chat rooms and mailing lists is expected to follow the NiaAML code of conduct.

๐Ÿ™ Support

โ“ Usage Questions

If you have questions about how to use NiaAML, or have an issue that isnโ€™t related to a bug, you can place a question on StackOverflow.

You can also seek support via email

NiaAML is a community supported package, nobody is paid to develop package nor to handle NiaAML support.

All people answering your questions are doing it with their own time, so please be kind and provide as much information as possible.

โ— Issues

Before creating bug reports, please check existing issues list as you might find out that you don't need to create one. When you are creating a bug report, please include as many details as possible in the issue template.

๐Ÿ”‘ Licence

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

๐Ÿ“ References

[1] Iztok Fister Jr., Milan Zorman, Duลกan Fister, Iztok Fister. Continuous optimizers for automatic design and evaluation of classification pipelines. In: Frontier applications of nature inspired computation. Springer tracts in nature-inspired computing, pp.281-301, 2020.

๐Ÿ“„ Cite us

@article{Peฤnik2021,
    doi = {10.21105/joss.02949},
    url = {https://doi.org/10.21105/joss.02949},
    year = {2021},
    publisher = {The Open Journal},
    volume = {6},
    number = {61},
    pages = {2949},
    author = {Luka Peฤnik and Iztok Fister},
    title = {NiaAML: AutoML framework based on stochastic population-based nature-inspired algorithms},
    journal = {Journal of Open Source Software} 
} 

L. Peฤnik, I. Fister Jr. "NiaAML: AutoML framework based on stochastic population-based nature-inspired algorithms." Journal of Open Source Software 6.61 (2021): 2949.

@inproceedings{pecnik_niaaml2_2021,
    address = {Cham},
    title = {{NiaAML2}: {An} {Improved} {AutoML} {Using} {Nature}-{Inspired} {Algorithms}},
    isbn = {978-3-030-78811-7},
    abstract = {Using machine learning methods in the real-world is far from being easy, especially because of the number of methods on the one hand, and setting the optimal values of their parameters on the other. Therefore, a lot of so-called AutoML methods have emerged nowadays that also enable automatic construction of classification pipelines to users, who are not experts in this domain. In this study, the NiaAML2 method is proposed that is capable of constructing the classification pipelines using nature-inspired algorithms in two phases: pipeline construction, and hyper-parameter optimization. This method improves the original NiaAML capable of this construction in one phase. The algorithm was applied to four UCI ML datasets, while the obtained results encouraged us to continue with the research.},
    booktitle = {Advances in {Swarm} {Intelligence}},
    publisher = {Springer International Publishing},
    author = {Peฤnik, Luka and Fister, Iztok and Fister, Iztok},
    editor = {Tan, Ying and Shi, Yuhui},
    year = {2021},
    pages = {243--252},
}

L. Peฤnik, Fister, I., Fister, I. Jr. NiaAML2: An Improved AutoML Using Nature-Inspired Algorithms. In International Conference on Swarm Intelligence (pp. 243-252). Springer, Cham, 2021.

@article{fister2025,
    title = {{NiaAML}: {AutoML} for classification and regression pipelines},
    volume = {29},
    rights = {All rights reserved},
    issn = {2352-7110},
    url = {https://www.sciencedirect.com/science/article/pii/S2352711024003443},
    doi = {10.1016/j.softx.2024.101974},
    shorttitle = {{NiaAML}},
    abstract = {In this paper we present {NiaAML}, an {AutoML} framework that we have developed for creating machine learning pipelines and hyperparameter tuning. The composition of machine learning pipelines is presented as an optimization problem that can be solved using various stochastic, population-based, nature-inspired algorithms. Nature-inspired algorithms are powerful tools for solving real-world optimization problems, especially those that are highly complex, nonlinear, and involve large search spaces where traditional algorithms may struggle. They are applied widely in various fields, including robotics, operations research, and bioinformatics. This paper provides a comprehensive overview of the software architecture, and describes the main tasks of {NiaAML}, including the automatic composition of classification and regression pipelines. The overview is supported by an practical illustrative example.},
    pages = {101974},
    journaltitle = {{SoftwareX}},
    author = {Fister, Iztok and Farthofer, Laurenz A. and Peฤnik, Luka and Fister, Iztok and Holzinger, Andreas},
    date = {2025-02-01},
    keywords = {{AutoML}, Classification, Nature-inspired algorithms, Optimization},
}

I. Fister, L. A. Farthofer, L. Peฤnik, I. Fister, and A. Holzinger, โ€œNiaAML: AutoML for classification and regression pipelines,โ€ SoftwareX, vol. 29, p. 101974, Feb. 2025, doi: 10.1016/j.softx.2024.101974.