dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Discuss - Image classification in AutoML.Net #5438

Closed LittleLittleCloud closed 3 years ago

LittleLittleCloud commented 3 years ago

What's the problem

Currently, Image classification in automl.net is brutal. By meaning brutal, it's because of the way automl.net run cross-validation on ImageClassification trainer: It's ten times longer than using ImageClassification API directly, with little difference. Here are some statistics.

The weather report is a dataset which includes around 1K images, the training time for using ImageClassification trainer directly is about 6 mins on Xeon W 2133 CPU, using resnet_v2_50_299, the time using AutoML.Net to train on the same dataset is astonishingly 60 minutes. However, the result is quite similar: Macro accuracy from using ImageClassification API is 96.07%, and from AutoML.Net is 96.29%.

Why the difference in the time is huge while the difference in evaluation result is small? It's because AutoML.Net runs a cross-validation check on a small dataset. (FYI small is defined as a dataset with less than 15000 rows). And the default fold number for cross-validation is 10, which means AutoML.Net will run the same trail for ten times with different dataset, and return the average evaluation metric and the model with evaluation closest to that metric. On the other hand, if I use a larger dataset with over 15000 pictures, the training time should be similar because AutoML.Net will use a train-validation split instead of cross-validation to run a single trail.

Is CV really necessary for image classification, especially when AutoML.Net only uses one trainer and the hyper-parameter tuning for that trainer is disabled? To answer this, let's first look back on what's CV's pro and cons, and why AutoML.Net uses CV. The pros of using CV is 1) it reduces the variance of the validation score, so less bias in evaluation result 2) it helps reduce the overfitting problem. The con is it's time-costy. In most of the case, CV is necessary for AutoML.Net because AutoML's smart sweeper relies on the validation score of each trail to perform SMBO algo, a less bias evaluation result helps produce better model. Moreover, the final evaluation metrics summary will be more objective because of CV. The time-costy problem still exists, but that's not a big one in normal cases because most of AutoML.Net trails only uses a rather short time to complete, and it's worthwhile to use the extra time in exchange for better result.

However, when it comes to image classification, the pros of running cross-validation disappear and the cons of CV becomes even more serious. As I mentioned above, there's only one trainer and no hyper-parameter tuning is available for that trainer, so a less bias validation score doesn't help sweeper to produce a better model. What makes things worse is the time for a single trail for image classification is much longer than a normal trail because of ImageClassification trainer. The only pros for using CV in image classification cases is left to be a more objective evaluation score in the summary, but is that really worthwhile enough with the cost of 10-time longer training time?

My answer is no, at least not for model builder users, where Azure training is available at hand. They can accept a lousy but faster (actually not too lousy, the model is the same, only evaluation score might be biased because of high variance) instead of better (not too better, see above) but much slower local training.

What we need to change

We have two ways to make image classification having a better experience in AutoML.Net, both training performance and training time. The first way is simple but with less improvement: simply disable CV in image classification, or at least make it a flag in the MultiClassificationExperiment option so the user can disable it. The second way is more of an AutoML-style solution: Use DNN featurizers along with traditional multiclassification trainers, The single trail time will be shorter, and it provides more space for its smart sweeper to tune over the hyper-parameter to find out the best combination.

taha-ghadirian commented 3 years ago

I Also Have this problem in MultiClassClassification. See #5437.

Lynx1820 commented 3 years ago

Hi @LittleLittleCloud,

I believe you can already specify your training/validation dataset, instead of using CV (ExperimentBase) Are you refering to modelbuilder?

Lynx1820 commented 3 years ago

As discussed offline, when there is only one trainer and no HP to tune, we should not perform an unnecessary CV. That doesn't mean the user simply cannot perform image classification without CV. As I pointed out above, there is an Automl overload the user can use to perform training without CV.

@LittleLittleCloud also proposed an enhanced approach for sweeping in image classification.

Lynx1820 commented 3 years ago

From a discussion with @justinormont offline, CV may help users using ImageClassification even if there is no hyperparameters swept, as cross-validation is useful for reducing the, otherwise, very large variance / confidence interval of the returned metrics.

@LittleLittleCloud provided some insights on the pros/cons of CV in the description and @justinormont provided the example below.

Variance: As an example, if there are only 100 images, in a TrainValid split the test dataset size will be 10 images, which will give a useless metric to the user. It will return a metric, but it's value is rather a random number. This is bad for the user.

Discretization: In this example, the accuracy can only take specifically one of {0, 10%, 20% ... 90% 100%} (only 11 possible values) based on how many the model gets right. This is also bad for the user.

CV helps with these: Running cross-val w/ 10 folds, reduces the variance on the returned metrics substantially, and for this example, gives 100 discrete possible values for the accuracy.

In summary, the CV useful for two items: sweeping and giving the user a useful metric.

I'm closing this issue, as there is a reason for using CV in ImageClassification, and there is an option for the user to disable it. @LittleLittleCloud feel free to open an issue with your second suggestion to enhance the ImageClassification results.