Operating System (python -c 'import platform;print(platform.platform())'): Darwin-19.6.0-x86_64-i386-64bit
Description
Documentation has this claim:
For example if you want to search for the best
$ abz search -i /path/to/your/datasets/folder name_of_your_dataset
This will evaluate the default pipeline without performing additional tuning iteration on it.
This seems to be misleading, as running the search with no arguments actually evaluates 1000+ iterations before I killed it.
What I Did
$ time abz search 196_autoMpg
Using TensorFlow backend.
20201015192335979857 - Processing Datasets: ['196_autoMpg']
###############################
#### Searching 196_autoMpg ####
###############################
[15:23:37] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
<repeated 8000 times>
^C
###############################
#### Executing 196_autoMpg ####
###############################
[16:23:50] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Executing best pipeline ABPipeline({
"primitives": [
"mlprimitives.custom.feature_extraction.CategoricalEncoder",
"sklearn.impute.SimpleImputer",
"sklearn.preprocessing.RobustScaler",
"xgboost.XGBRegressor"
],
"init_params": {},
"input_names": {},
"output_names": {},
"hyperparameters": {
"mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
"keep": false,
"copy": true,
"features": "auto",
"max_unique_ratio": 0,
"max_labels": 25
},
"sklearn.impute.SimpleImputer#1": {
"missing_values": NaN,
"fill_value": null,
"verbose": false,
"copy": true,
"strategy": "median"
},
"sklearn.preprocessing.RobustScaler#1": {
"quantile_range": [
25.0,
75.0
],
"copy": true,
"with_centering": true,
"with_scaling": true
},
"xgboost.XGBRegressor#1": {
"n_jobs": -1,
"n_estimators": 617,
"max_depth": 9,
"learning_rate": 0.03240539972838852,
"gamma": 0.27690923264683187,
"min_child_weight": 5
}
},
"tunable_hyperparameters": {
"mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
"max_labels": {
"type": "int",
"default": 0,
"range": [
0,
100
]
}
},
"sklearn.impute.SimpleImputer#1": {
"strategy": {
"type": "str",
"default": "mean",
"values": [
"mean",
"median",
"most_frequent",
"constant"
]
}
},
"sklearn.preprocessing.RobustScaler#1": {
"with_centering": {
"description": "If True, center the data before scaling. This will cause transform to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory",
"type": "bool",
"default": true
},
"with_scaling": {
"description": "If True, scale the data to interquartile range",
"type": "bool",
"default": true
}
},
"xgboost.XGBRegressor#1": {
"n_estimators": {
"type": "int",
"default": 100,
"range": [
10,
1000
]
},
"max_depth": {
"type": "int",
"default": 3,
"range": [
3,
10
]
},
"learning_rate": {
"type": "float",
"default": 0.1,
"range": [
0,
1
]
},
"gamma": {
"type": "float",
"default": 0.1,
"range": [
0,
1
]
},
"min_child_weight": {
"type": "int",
"default": 1,
"range": [
1,
10
]
}
}
},
"outputs": {
"default": [
{
"name": "y",
"type": "array",
"variable": "xgboost.XGBRegressor#1.y"
}
]
},
"id": "e168ec26-31f0-4e78-a3a7-3ef18bf432c8",
"name": "single_table/regression/default",
"template": null,
"loader": {
"data_modality": "single_table",
"task_type": "regression"
},
"score": 8.4004691556447,
"rank": 8.400469155645126,
"metric": "meanSquaredError"
})
#############################
#### Scoring 196_autoMpg ####
#############################
Score: 7.041906911649814
predictions targets
count 100.000000 100.000000
mean 23.589642 23.478000
std 7.581228 7.573446
min 10.351545 10.000000
25% 17.002141 17.375000
50% 24.067155 23.250000
75% 29.522121 28.000000
max 38.241291 44.000000
pipeline score rank cv_score metric data_modality task_type task_subtype elapsed iterations load_time trivial_time cv_time error step
dataset
196_autoMpg e168ec26-31f0-4e78-a3a7-3ef18bf432c8 7.041907 8.400469 8.400469 meanSquaredError single_table regression univariate 3613.11274 1693.0 0.059046 1.091654 3307.688052 None None
real 60m17.985s
user 61m12.325s
sys 50m16.661s
python -c 'import platform;print(platform.platform())'
): Darwin-19.6.0-x86_64-i386-64bitDescription
Documentation has this claim:
This seems to be misleading, as running the search with no arguments actually evaluates 1000+ iterations before I killed it.
What I Did