dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

LoadRawImageBytes does not load anything #6011

Closed atkinsonbg closed 2 years ago

atkinsonbg commented 2 years ago

System Information (please complete the following information):

Describe the bug Following the tutorial list here, (https://docs.microsoft.com/en-us/samples/dotnet/machinelearning-samples/mlnet-image-classification-transfer-learning/) and the LoadRawImageBytes does not appear to load images from the directory.

To Reproduce Steps to reproduce the behavior:

Code used:

IEnumerable<ImageData> images = LoadImagesFromCsv("Data/training_data.csv");

            IDataView imageData = mlContext.Data.LoadFromEnumerable(images);
            IDataView shuffledData = mlContext.Data.ShuffleRows(imageData, shuffleSource: true);

            var previewshuffledData = shuffledData.Preview();

            var preprocessingPipeline = mlContext.Transforms.Conversion
                                                .MapValueToKey(
                                                    inputColumnName: "Label",
                                                    outputColumnName: "LabelAsKey")
                                                .Append(mlContext.Transforms.LoadRawImageBytes(
                                                    outputColumnName: "Image",
                                                    imageFolder: "Data/LocalImages",
                                                    inputColumnName: "ImagePath"));

            IDataView preProcessedData = preprocessingPipeline
                    .Fit(shuffledData)
                    .Transform(shuffledData);

            var previewPreprocessedData = preProcessedData.Preview();

previewshuffledData:

Screen Shot 2021-11-29 at 2 21 21 PM Screen Shot 2021-11-29 at 2 21 41 PM

previewPreprocessedData:

Screen Shot 2021-11-29 at 2 21 50 PM

Expected behavior Expecting to see a RowView after LoadRawImageBytes of 63, which matches the number of images in the dataset.

Screenshots, Code, Sample Projects If applicable, add screenshots, code snippets, or sample projects to help explain your problem.

Additional context Add any other context about the problem here.

luisquintanilla commented 2 years ago

@atkinsonbg instead of using the Preview method, can you try converting the IDataView to an IEnumerable and iterate over that?

var previewPreprocessedData = mlContext.Data.CreateEnumerable(preProcessedData, reuseRowObject:false);

atkinsonbg commented 2 years ago

@luisquintanilla I change it to: var previewPreprocessedData = mlContext.Data.CreateEnumerable<ImagePredictionInput>(preProcessedData, reuseRowObject: false);

Result

Screen Shot 2021-11-29 at 3 20 13 PM

:

atkinsonbg commented 2 years ago

I pulled the sample code from here: https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/getting-started/DeepLearning_ImageClassification_Binary

And that is finding the images just fine:

Screen Shot 2021-11-29 at 3 25 40 PM

But this code is so simple, I'm not sure where the breakdown is happening.

luisquintanilla commented 2 years ago

@atkinsonbg if you replace the images in the assets directory of the code sample with your own images, does it work?

atkinsonbg commented 2 years ago

@luisquintanilla I am so sorry, I have wasted your time today. I found the culprit after running your code, it was in the CSVHelper package found here: https://www.nuget.org/packages/CsvHelper/

Stripping that out and manually creating some input classes:

public static IEnumerable<ImageData> LoadImagesFromCsv(string csvPath)
        {
            //TextReader reader = new StreamReader(csvPath);
            //var csvReader = new CsvReader(reader, CultureInfo.CurrentCulture);
            //return csvReader.GetRecords<ImageData>();

            List<ImageData> list = new List<ImageData>();
            list.Add(new ImageData
            {
                ImagePath = "15970_2.jpg",
                Label = "Navy"
            });
            list.Add(new ImageData
            {
                ImagePath = "39386_2.jpg",
                Label = "Blue"
            });
            return list.AsEnumerable();
        }

Results in the images being processed:

Screen Shot 2021-11-29 at 4 12 24 PM

That's what I get for trying to be fancy. Will parse the CSV another way. Thank you so much for jumping in and taking a look.

luisquintanilla commented 2 years ago

@atkinsonbg No problem. Thanks for closing the issue.

atkinsonbg commented 2 years ago

Just to follow-up, I was still able to use the CSVHelper package, but needed to change the implementation to the following:

public static IEnumerable<ImageData> LoadImagesFromCsv(string csvPath)
        {
            var records = new List<ImageData>();
            using (var reader = new StreamReader(csvPath))
            using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
            {
                csv.Read();
                csv.ReadHeader();
                while (csv.Read())
                {
                    var record = new ImageData
                    {
                        ImagePath = csv.GetField("ImagePath"),
                        Label = csv.GetField("Label")
                    };
                    records.Add(record);
                }
            }

            return records.AsEnumerable();
        }
    Working great now. Thanks again!