byu-dml / metalearn

BYU's python library of useable tools for metalearning
MIT License
22 stars 6 forks source link
Develop Master
Build Status Build Status
codecov codecov

Metalearn

The Data Mining Lab of the Computer Science Department of Brigham Young University (BYU-DML) python3 library of meta-learning tools. Extracts general, statistical, information-theoretic, landmarking and model-based meta-features from tabular datasets for use in meta-learning applications.

Installation

Using pip:

pip install metalearn

From source:

git clone https://github.com/byu-dml/metalearn.git
cd metalearn
python3 setup.py install

Note that this project follows the versioning scheme defined by Semantic Versioning 2.0.0. This means (among other things) that a given version of the package has zero guarantee of backwards compatibility with previous major versions.

Example Usage

Simple Example

from metalearn import Metafeatures
import pandas as pd
import numpy as np

# X and Y must be a pandas DataFrame and a pandas Series respectively
X = pd.DataFrame(np.random.rand(8,2))
Y = pd.Series(['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], name='targets')

metafeatures = Metafeatures()
mfs = metafeatures.compute(X, Y)

Full Example

import pandas as pd
import numpy as np
from metalearn import Metafeatures

data = pd.DataFrame({
    'cat': np.random.choice(['a', 'b'], size=20),
    'num': np.random.rand(20),
    'targets': np.random.choice(['x', 'y'], size=20)
})

X = data.drop('targets', axis=1)
Y = data['targets']

metafeatures = Metafeatures()
mfs = metafeatures.compute(
    X,
    Y=Y,
    column_types={'cat': 'CATEGORICAL', 'num': 'NUMERIC', 'targets': 'CATEGORICAL'},
    metafeature_ids=['RatioOfNumericFeatures'],
    exclude=None,
    sample_shape=(8, None),
    seed=0,
    n_folds=2,
    verbose=True,
    timeout=10,
    return_times=True,
)

print(mfs)

# RatioOfNumericFeatures
# {'RatioOfNumericFeatures': {'value': 0.5, 'compute_time': 3.9138991269283e-05}}

Warning: Metafeatures are timed as if each dependency has to be recomputed whenever it is needed. This means that the returned times may not be accurate for a particular application, especially if a metafeature depends on a computationally intensive resource in multiple places.

Using the Test Suite

Using this cloned or downloaded repository, the tests can be run with:

pip install -r requirements.txt
python3 run_tests.py

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgements

Many of the metafeatures in this package were inspired by the work done in the R project mfe