Closed shabir1 closed 2 years ago
For no data-preprocessing, please see this example. If your data is dirty in any way then this will cause failures as the pipeline relies on cleaned data, which the data preprocessing does.
For no feature engineering, please see the docs here.
@eddiebergman Tried
from autosklearn.pipeline.components.feature_preprocessing.no_preprocessing import NoPreprocessing
autosklearn.pipeline.components.data_preprocessing.add_preprocessor(NoPreprocessing)
clf = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
include={
'data_preprocessor': ['NoPreprocessing']
},
# Bellow two flags are provided to speed up calculations
# Not recommended for a real implementation
initial_configurations_via_metalearning=0,
smac_scenario_args={'runcount_limit': 5},
)
clf.fit(X_train, y_train)
Got below error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-bcaf679fd90c> in <module>
----> 1 automl.fit(x_train.copy(), y_train.copy())
~/.local/lib/python3.8/site-packages/autosklearn/estimators.py in fit(self, X, y, X_test, y_test, feat_type, dataset_name)
937 self.target_type = target_type
938
--> 939 super().fit(
940 X=X,
941 y=y,
~/.local/lib/python3.8/site-packages/autosklearn/estimators.py in fit(self, **kwargs)
328 if self.automl_ is None:
329 self.automl_ = self.build_automl()
--> 330 self.automl_.fit(load_models=self.load_models, **kwargs)
331
332 return self
~/.local/lib/python3.8/site-packages/autosklearn/automl.py in fit(self, X, y, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models)
1913 load_models: bool = True,
1914 ):
-> 1915 return super().fit(
1916 X, y,
1917 X_test=X_test,
~/.local/lib/python3.8/site-packages/autosklearn/automl.py in fit(self, X, y, task, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models, is_classification)
790 # like this we can't use some of the preprocessing methods in case
791 # the data became sparse)
--> 792 self.configuration_space, configspace_path = self._create_search_space(
793 self._backend.temporary_directory,
794 self._backend,
~/.local/lib/python3.8/site-packages/autosklearn/automl.py in _create_search_space(self, tmp_dir, backend, datamanager, include, exclude)
1853 self._stopwatch.start_task(task_name)
1854 configspace_path = os.path.join(tmp_dir, 'space.json')
-> 1855 configuration_space = pipeline.get_configuration_space(
1856 datamanager.info,
1857 include=include,
~/.local/lib/python3.8/site-packages/autosklearn/util/pipeline.py in get_configuration_space(info, include, exclude)
31 return _get_regression_configuration_space(info, include, exclude)
32 else:
---> 33 return _get_classification_configuration_space(info, include, exclude)
34
35
~/.local/lib/python3.8/site-packages/autosklearn/util/pipeline.py in _get_classification_configuration_space(info, include, exclude)
86 }
87
---> 88 return SimpleClassificationPipeline(
89 dataset_properties=dataset_properties,
90 include=include, exclude=exclude).\
~/.local/lib/python3.8/site-packages/autosklearn/pipeline/classification.py in __init__(self, config, steps, dataset_properties, include, exclude, random_state, init_params)
83 if 'target_type' not in dataset_properties:
84 dataset_properties['target_type'] = 'classification'
---> 85 super().__init__(
86 config=config,
87 steps=steps,
~/.local/lib/python3.8/site-packages/autosklearn/pipeline/base.py in __init__(self, config, steps, dataset_properties, include, exclude, random_state, init_params)
52 self._validate_include_exclude_params()
53
---> 54 self.config_space = self.get_hyperparameter_search_space()
55
56 if config is None:
~/.local/lib/python3.8/site-packages/autosklearn/pipeline/base.py in get_hyperparameter_search_space(self, dataset_properties)
238 """
239 if not hasattr(self, 'config_space') or self.config_space is None:
--> 240 self.config_space = self._get_hyperparameter_search_space(
241 include=self.include, exclude=self.exclude,
242 dataset_properties=self.dataset_properties)
~/.local/lib/python3.8/site-packages/autosklearn/pipeline/classification.py in _get_hyperparameter_search_space(self, include, exclude, dataset_properties)
184 dataset_properties['sparse'] = False
185
--> 186 cs = self._get_base_search_space(
187 cs=cs, dataset_properties=dataset_properties,
188 exclude=exclude, include=include, pipeline=self.steps)
~/.local/lib/python3.8/site-packages/autosklearn/pipeline/base.py in _get_base_search_space(self, cs, dataset_properties, exclude, include, pipeline)
350 include.get(node_name),
351 exclude.get(node_name))
--> 352 sub_config_space = node.get_hyperparameter_search_space(
353 dataset_properties, include=choices_list)
354 cs.add_configuration_space(node_name, sub_config_space)
~/.local/lib/python3.8/site-packages/autosklearn/pipeline/components/data_preprocessing/__init__.py in get_hyperparameter_search_space(self, dataset_properties, default, include, exclude)
118 cs.add_hyperparameter(preprocessor)
119 for name in available_preprocessors:
--> 120 preprocessor_configuration_space = available_preprocessors[name](
121 dataset_properties=dataset_properties). \
122 get_hyperparameter_search_space(dataset_properties)
TypeError: __init__() got an unexpected keyword argument 'dataset_properties'
@eddiebergman I have to create my own NoPreprocessing
Class ?
@shabir1 yes, the code of which can be seen and modified in the example, we may eventually include it as a native part of the package but as we require data preprocessing for sklearn to work, we don't provide it as a default option.
@eddiebergman Thank you
@eddiebergman What are the possible values for data_preprocessor and feature_preprocessor.
include={
'data_preprocessor': [?],
'feature_preprocessor': [?]
}
@eddiebergman I found the possible values
'feature_preprocessor': ['densifier', 'extra_trees_preproc_for_classification', 'fast_ica', 'feature_agglomeration', 'kernel_pca', 'kitchen_sinks', 'liblinear_svc_preprocessor', 'no_preprocessing', 'nystroem_sampler', 'pca', 'polynomial', 'random_trees_embedding', 'select_percentile_classification', 'select_rates_classification', 'truncatedSVD']
data_preprocessor : ['feature_type', 'NoPreprocessing']
In feature_type
there are different data preprocessors, Can we exclude/include few from those, if yes then how?
You can use the include
, exclude
parameters here. They are however mutually exclusive, you can only specify one.
@eddiebergman I am talking about data_preprocessor
,
include={
'data_preprocessor': ['feature_type']
}
or
exclude={
'data_preprocessor': ['feature_type']
}
We can include or exclude feature_type
only because there are only two possible values for 'data_preprocessor': ['feature_type', 'NoPreProcessing']. but I want to exclude hot_encoding or other data preprocessing, how to do that
Currently not possible, you can preprocess it however you like before hand and use NoPreProcessing
then. The reason we don't have it at the moment is because data preprocessing is applied column wise and our structure is flexible enough to handle that right now.
Okay, thank you
Can we include/exclude data preprocessing algorithms?
What configuration do I have to set if I don't need any data preprocessing, Or if I want to use only specific feature preprocessing and data preprocessing algorithms?
I tried: