dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

[Image Classification API] No evaluation when batchSize parameter > # of instances in dataset #4274

Closed luisquintanilla closed 5 years ago

luisquintanilla commented 5 years ago

System information

Issue

Tried to train an image classification model using the Image Classification API. The value set for batchSize parameter is 300. Meanwhile then number of data instances in the test set is 182.

No evaluation takes place. 0 batches are processed.

The model to train and for it to evaluate the number of instances provided. In this case since the number of data instances is less than the amount set for the batchSize parameter, it would process 1 batch instead of 0.

The model to evaluate

Source code / logs

Pipeline:

var trainingPipeline =
                mapLabelTransform
               .Append(mlContext.Model.ImageClassification(
                   featuresColumnName: "ImagePath",
                   labelColumnName: "LabelAsKey",
                   arch: ImageClassificationEstimator.Architecture.ResnetV2101,
                   epoch: 100,
                   batchSize: 300,
                   testOnTrainSet: false,
                   metricsCallback: (metrics) => Console.WriteLine(metrics),
                   validationSet: transformedTestData,
                   reuseTrainSetBottleneckCachedValues: true,
                   reuseValidationSetBottleneckCachedValues: true));

Output:

Number of rows 182
Phase: Training, Dataset used: Validation, Batch Processed Count:   0, Epoch:  93, Accuracy:        NaN
Phase: Training, Dataset used: Validation, Batch Processed Count:   0, Epoch:  94, Accuracy:        NaN
Phase: Training, Dataset used: Validation, Batch Processed Count:   0, Epoch:  95, Accuracy:        NaN
Phase: Training, Dataset used: Validation, Batch Processed Count:   0, Epoch:  96, Accuracy:        NaN
Phase: Training, Dataset used: Validation, Batch Processed Count:   0, Epoch:  97, Accuracy:        NaN
Phase: Training, Dataset used: Validation, Batch Processed Count:   0, Epoch:  98, Accuracy:        NaN
Phase: Training, Dataset used: Validation, Batch Processed Count:   0, Epoch:  99, Accuracy:        NaN

When the batchSize is set equal to the number of rows (in this case 182), this is the output:

Phase: Training, Dataset used: Validation, Batch Processed Count:   1, Epoch:  95, Accuracy:          1
Phase: Training, Dataset used: Validation, Batch Processed Count:   1, Epoch:  96, Accuracy:          1
Phase: Training, Dataset used: Validation, Batch Processed Count:   1, Epoch:  97, Accuracy:          1
Phase: Training, Dataset used: Validation, Batch Processed Count:   1, Epoch:  98, Accuracy:          1
Phase: Training, Dataset used: Validation, Batch Processed Count:   1, Epoch:  99, Accuracy:          1
ashbhandare commented 5 years ago

This happens because currently, a batch is only processed if it has number of samples, and any incomplete batch is skipped. @codemzs is working on changing this.