Open lunluen opened 5 years ago
Sorry the correct last line is key2 should be [SS(), FT(f2), FT(f1)] rather than [FT(f2), FT(f1), SS()]
>>> import mlens
>>> mlens.__version__
'0.2.3'
Hey! Thanks for flagging this, definitely looks like a bug. It's reordering based on name, the offending line seems to be the sorted()
call applied to the output of _format_instances
, on L99, which will sort the list of named preprocessors according to their names.
I'll try to fix this some time this week, but feel free to make a PR.
Hi @flennerhag , Thanks for the reply. And... I found another bug here. :astonished:
from mlens.utils import check_instances
from sklearn.preprocessing import (
StandardScaler as SS, FunctionTransformer as FT)
from xgboost import XGBClassifier
clf = XGBClassifier(n_estimators=10)
def f1(): return 1
def f2(): return 2
s = SS()
estimators = {
'key-1': [clone(clf)],
'key-2': [clone(clf)],
}
preprocessing = {
'key-1': [FT(f2), s, FT(f1)],
'key-2': [s, FT(f2), FT(f1)],
}
check_instances(estimators, preprocessing)[0]
[('key-1',
[('functiontransformer-1',
FunctionTransformer(accept_sparse=False, check_inverse=True,
func=<function f2 at 0x000001C9B3C389D8>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=None)),
('functiontransformer-2',
FunctionTransformer(accept_sparse=False, check_inverse=True,
func=<function f1 at 0x000001C9B3C38BF8>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=None))]),
('key-2',
[('functiontransformer-3',
FunctionTransformer(accept_sparse=False, check_inverse=True,
func=<function f2 at 0x000001C9B3C389D8>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=None)),
('functiontransformer-4',
FunctionTransformer(accept_sparse=False, check_inverse=True,
func=<function f1 at 0x000001C9B3C38BF8>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=None)),
('standardscaler-1',
StandardScaler(copy=True, with_mean=True, with_std=True)),
('standardscaler-2',
StandardScaler(copy=True, with_mean=True, with_std=True))])]
The two standard scaler (singleton) are grouped together.
If I understand it right,
check_instances
seems to tend to cluster transformers with same types without regard to the list order. So the results frommake_group
andBaseEnsemble
are also wrong.Reproduce - Case 1
Code
Result
Expected Behavior
The order should be
[FT(f1), SS(), FT(f2)]
rather thanFT(f1), FT(f2), SS()
Reproduce - Case 2
Code
Result
key1 should be [FT(f2), SS(), FT(f1)] rather than [FT(f2), FT(f1), SS()] key2 should be [FT(f2), SS(), FT(f1)] rather than [FT(f2), FT(f1), SS()]