MLBazaar / AutoBazaar

AutoBazaar: An AutoML System from the Machine Learning Bazaar
https://mlbazaar.github.io/AutoBazaar/
MIT License
32 stars 12 forks source link

Documentation misleading about search with no parameters #27

Open micahjsmith opened 4 years ago

micahjsmith commented 4 years ago

Description

Documentation has this claim:

For example if you want to search for the best

$ abz search -i /path/to/your/datasets/folder name_of_your_dataset

This will evaluate the default pipeline without performing additional tuning iteration on it.

This seems to be misleading, as running the search with no arguments actually evaluates 1000+ iterations before I killed it.

What I Did

$ time abz search 196_autoMpg
Using TensorFlow backend.
20201015192335979857 - Processing Datasets: ['196_autoMpg']
###############################
#### Searching 196_autoMpg ####
###############################
[15:23:37] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
<repeated 8000 times>
^C
###############################
#### Executing 196_autoMpg ####
###############################
[16:23:50] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Executing best pipeline ABPipeline({
    "primitives": [
        "mlprimitives.custom.feature_extraction.CategoricalEncoder",
        "sklearn.impute.SimpleImputer",
        "sklearn.preprocessing.RobustScaler",
        "xgboost.XGBRegressor"
    ],
    "init_params": {},
    "input_names": {},
    "output_names": {},
    "hyperparameters": {
        "mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
            "keep": false,
            "copy": true,
            "features": "auto",
            "max_unique_ratio": 0,
            "max_labels": 25
        },
        "sklearn.impute.SimpleImputer#1": {
            "missing_values": NaN,
            "fill_value": null,
            "verbose": false,
            "copy": true,
            "strategy": "median"
        },
        "sklearn.preprocessing.RobustScaler#1": {
            "quantile_range": [
                25.0,
                75.0
            ],
            "copy": true,
            "with_centering": true,
            "with_scaling": true
        },
        "xgboost.XGBRegressor#1": {
            "n_jobs": -1,
            "n_estimators": 617,
            "max_depth": 9,
            "learning_rate": 0.03240539972838852,
            "gamma": 0.27690923264683187,
            "min_child_weight": 5
        }
    },
    "tunable_hyperparameters": {
        "mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
            "max_labels": {
                "type": "int",
                "default": 0,
                "range": [
                    0,
                    100
                ]
            }
        },
        "sklearn.impute.SimpleImputer#1": {
            "strategy": {
                "type": "str",
                "default": "mean",
                "values": [
                    "mean",
                    "median",
                    "most_frequent",
                    "constant"
                ]
            }
        },
        "sklearn.preprocessing.RobustScaler#1": {
            "with_centering": {
                "description": "If True, center the data before scaling. This will cause transform to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory",
                "type": "bool",
                "default": true
            },
            "with_scaling": {
                "description": "If True, scale the data to interquartile range",
                "type": "bool",
                "default": true
            }
        },
        "xgboost.XGBRegressor#1": {
            "n_estimators": {
                "type": "int",
                "default": 100,
                "range": [
                    10,
                    1000
                ]
            },
            "max_depth": {
                "type": "int",
                "default": 3,
                "range": [
                    3,
                    10
                ]
            },
            "learning_rate": {
                "type": "float",
                "default": 0.1,
                "range": [
                    0,
                    1
                ]
            },
            "gamma": {
                "type": "float",
                "default": 0.1,
                "range": [
                    0,
                    1
                ]
            },
            "min_child_weight": {
                "type": "int",
                "default": 1,
                "range": [
                    1,
                    10
                ]
            }
        }
    },
    "outputs": {
        "default": [
            {
                "name": "y",
                "type": "array",
                "variable": "xgboost.XGBRegressor#1.y"
            }
        ]
    },
    "id": "e168ec26-31f0-4e78-a3a7-3ef18bf432c8",
    "name": "single_table/regression/default",
    "template": null,
    "loader": {
        "data_modality": "single_table",
        "task_type": "regression"
    },
    "score": 8.4004691556447,
    "rank": 8.400469155645126,
    "metric": "meanSquaredError"
})
#############################
#### Scoring 196_autoMpg ####
#############################
Score: 7.041906911649814
       predictions     targets
count   100.000000  100.000000
mean     23.589642   23.478000
std       7.581228    7.573446
min      10.351545   10.000000
25%      17.002141   17.375000
50%      24.067155   23.250000
75%      29.522121   28.000000
max      38.241291   44.000000
                                         pipeline     score      rank  cv_score            metric data_modality   task_type task_subtype     elapsed  iterations  load_time  trivial_time      cv_time error  step
dataset
196_autoMpg  e168ec26-31f0-4e78-a3a7-3ef18bf432c8  7.041907  8.400469  8.400469  meanSquaredError  single_table  regression   univariate  3613.11274      1693.0   0.059046      1.091654  3307.688052  None  None

real    60m17.985s
user    61m12.325s
sys     50m16.661s