Azure / automl-codegen-preview

AutoML code generation
MIT License
11 stars 5 forks source link

Can we get rid of the "problem info" as a parameter / hyper-parameter for the algorithm? #4

Open CESARDELATORRE opened 3 years ago

CESARDELATORRE commented 3 years ago

@wchill, If possible, we should not have this code as part of the algorithm hyper-params:

        problem_info=ProblemInfo(
            gpu_training_param_dict={'processing_unit_type': 'cpu'}
        ),
    algorithm = XGBoostClassifier(
        random_state=0,
        n_jobs=-1,
        problem_info=ProblemInfo(
            gpu_training_param_dict={'processing_unit_type': 'cpu'}
        ),
        booster='gbtree',
        colsample_bytree=0.8,
        eta=0.001,
        gamma=0,
        max_depth=9,
        max_leaves=255,
        n_estimators=10,
        objective='reg:logistic',
        reg_alpha=1.9791666666666667,
        reg_lambda=1.7708333333333335,
        subsample=0.7,
        tree_method='auto'
    )

The compute (CPU vs. GPU) should be a decision to be taken at the notebook/AML SDK level, not at this level, I believe, unless this parameter is really a hyper-parameter in this case for the XGBoostClassifier algorithm, but I don't think so...

wchill commented 3 years ago

It's not technically a hyper-parameter for XGBoost itself, but it is needed so we can tell XGBoost to use the GPU-optimized tree method.

Removal of the ProblemInfo itself is being tracked internally.