autogluon / autogluon

Fast and Accurate ML in 3 Lines of Code
https://auto.gluon.ai/
Apache License 2.0
7.8k stars 910 forks source link

TabularNeuralNetModel sometimes does not train during task.fit() on CPU #669

Closed cpoptic closed 4 years ago

cpoptic commented 4 years ago

I observe the TabularNeuralNetModel sometimes does not train during task.fit() when training on CPU.

Digging into the presets.py file https://github.com/awslabs/autogluon/blob/960f2c4e60132410fc0342cf8f759d89a077e340/autogluon/utils/tabular/ml/trainer/model_presets/presets.py

it appears the DEFAULT_MODEL_PRIORITY sets NN to 50, lower than most other model types (e.g. RF, XT, GBM, CAT).

How to determine whether a model class will be included in a training run before starting that run? Under what conditions would the NN model be skipped? Is it a function of time_limits parameter in task.fit()?

Also would setting the DEFAULT_MODEL_PRIORITY dictionary of 'NN' to something higher than the default 50 guarantee its inclusion the model fitting?

I would simply explicitly set the hyperparameters argument for the task.fit method to a str of NN, as said, but that model type is already included in the defaults, so it shouldn't need to be explicitly specified, right?

In essence I want to know: *How to guarantee that a TabularNeuralNetModel model is included in the task.fit() training run?

Innixma commented 4 years ago

Thanks for the question!

If the neural network is not being trained, that means that the time limit specified was too short and AutoGluon ran out of time before training the neural network. Either increase time_limits, don't provide time_limits at all (will guarantee each model is trained), or specify a custom hyperparameter config with custom model priority.

In general, I am planning to make the NN train earlier in the next release of AutoGluon, training before CatBoost and ExtraTrees most likely.

cpoptic commented 4 years ago

Ah perfect, thanks, that's what I figured. I'll set the model to train longer next time.

If not providing time_limits at all, is there an upper bound on the time to train? And if you interrupt the training process with Ctrl+C, will this cause an inconsistencies or errors? Thanks again

I think the order of the models doesn't necessarily need to be changed. I can understand why the neural network would be one of the later models to train, as I'd expect it to take longer to train than the other simpler models.

One last question: is there any heuristic for setting the DEFAULT_MODEL_PRIORITY weights of the models? I can adjust them, but how does one make an informed decision as to which models "should" be weighted more heavily? Thanks

Innixma commented 4 years ago

Ah perfect, thanks, that's what I figured. I'll set the model to train longer next time.

If not providing time_limits at all, is there an upper bound on the time to train? And if you interrupt the training process with Ctrl+C, will this cause an inconsistencies or errors? Thanks again

If time_limits is not set, AutoGluon will train each model to completion regardless of how long it takes. There is no upper bound in this case. If you interrupt the training process, only models which have finished training will be available within the predictor when loaded. If the logs say the following, the model has finished training:

Fitting model: NeuralNetClassifier_STACKER_l0 ...
    0.9802   = Validation accuracy score
    8.01s    = Training runtime
    0.09s    = Validation runtime

If you stopped the training early, it is likely that no weighted ensemble will have been trained. To manually train the weighted ensemble call predictor.fit_weighted_ensemble() after loading the predictor object.

I think the order of the models doesn't necessarily need to be changed. I can understand why the neural network would be one of the later models to train, as I'd expect it to take longer to train than the other simpler models.

This is true for binary and regression problems, but for multiclass the neural network can often be the fastest model, so the priority may be dependent on problem type in future.

One last question: is there any heuristic for setting the DEFAULT_MODEL_PRIORITY weights of the models? I can adjust them, but how does one make an informed decision as to which models "should" be weighted more heavily? Thanks

To clarify, the model priority is not a weight, it is an order. All that matters is if a value is higher or lower than another value to determine training order. RF and XT have high priority simply because they are quick to train and robust.