Open kilig000123 opened 4 months ago
如果你已经能够找到weight与score的话,具体每个模型的信息也就很容易了。
可以参考以下示例:
#estimator = = experiment.run()
ensembled = estimator.steps[-1][-1]
weights= ensembled.weights_
models = ensembled.estimators
for i,(w,m) in enumerate(zip(weights,models)):
if m is not None:
print('-'*30)
print(i, w, m)
输出如下:
------------------------------
0 0.55 HyperGBMEstimator(task=binary, reward_metric=precision, cv=True,
data_pipeline: DataFrameMapper(df_out=True,
df_out_dtype_transforms=[(ColumnSelector(include:['object', 'string']),
'int')],
features=[(ColumnSelector(include:['object', 'string', 'category', 'bool']),
Pipeline(steps=[('categorical_imputer_0',
SafeSimpleImputer(strategy='constant')),
('categorical_label_encoder_0',
MultiLabelEncoder())])),
(ColumnSelector(include:number, exclude:timedelta),
Pipeline(steps=[('numeric_imputer_0',
FloatOutputImputer(strategy='median')),
('numeric_log_standard_scaler_0',
LogStandardScaler())]))],
input_df=True)
gbm_model: CatBoostClassifierWrapper(learning_rate=0.5, depth=10, l2_leaf_reg=20, silent=True, n_estimators=200, random_state=55954, eval_metric='Precision')
)
------------------------------
4 0.4 HyperGBMEstimator(task=binary, reward_metric=precision, cv=True,
data_pipeline: DataFrameMapper(df_out=True,
df_out_dtype_transforms=[(ColumnSelector(include:['object', 'string']),
'int')],
features=[(ColumnSelector(include:['object', 'string', 'category', 'bool']),
Pipeline(steps=[('categorical_imputer_0',
SafeSimpleImputer(strategy='constant')),
('categorical_label_encoder_0',
MultiLabelEncoder())])),
(ColumnSelector(include:number, exclude:timedelta),
Pipeline(steps=[('numeric_imputer_0',
FloatOutputImputer(strategy='median')),
('numeric_robust_scaler_0',
RobustScaler())]))],
input_df=True)
gbm_model: LGBMClassifierWrapper(boosting_type='goss', early_stopping_rounds=10,
learning_rate=0.5, max_depth=5, n_estimators=200,
num_leaves=440, random_state=58258, reg_alpha=10,
reg_lambda=0.5, verbosity=-1)
)
------------------------------
...
我在这样的信息里面看到了categorical_label_encoder_0,这种类别编码方式具体是什么呢?
HyperGBM实现的是从预处理到模型训练的全链路优化,categorical_label_encoder_0是对categorical数据进行的预处理,以我上面示例的输出为例:
此时对categorical进行编码采用的是MultiLabelEncoder
,MultiLabelEncoder来自于Hypernet对LabelEncoder的封装。
关于HyperGBM缺省的优化空间的详细定义可参考源代码 search_space.py
和 sklearn_ops.py
使用hypergbm后,我的数据集在一些指标上有了大幅提升,但是我想知道在experiment跑完后聚合出来的最优结果的具体模型,想知道其的具体使用到了什么模型以及详细参数,我目前只找到了聚合后的weight与score。 如果有方法请告诉我,这对我理解模型十分重要