dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

My confusion trying to use EAST text detector model with the ML.net #5314

Closed sereal96 closed 4 years ago

sereal96 commented 4 years ago

System information

Issue

https://www.kaggle.com/yelmurat/frozen-east-text-detection

however I don't know if I am doing it the right way. (I only began using ML.net last month) First I tried using OpenCV with this example:

https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp

It runs fine everything is Ok, but when I tried to do the same with ML.net...

Source code / logs

First I define this

` static readonly string _assetsPath = Path.Combine(Environment.CurrentDirectory, "assets"); static readonly string _imagesFolder = Path.Combine(_assetsPath, "imagesText");
static readonly string _predictSingleImage = Path.Combine(_imagesFolder, "page10.jpg"); static readonly string _inceptionTensorFlowModel = Path.Combine(_assetsPath, "models","frozen_east_text_detection.pb");

    private const int imageHeight = 3104;// 576;  It should be multiple by 32
    private const int imageWidth  = 2304; //576;  It should be multiple by 32
    private const int numChannels = 3;
    private const int inputSize = imageHeight * imageWidth * numChannels;`

then I load the TensorFlow model and saved as ML.net model

using var modelX = mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel); var schema = modelX.GetModelSchema(); var inputchema = modelX.GetInputSchema(); var pipelineX = modelX.ScoreTensorFlowModel( outputColumnNames: new[] { "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3" }, nameof(OutputScores.output) }, inputColumnNames: new[] { "input_images" }, addBatchDimensionInput: false); }, addBatchDimensionInput: true); List<TensorData> list = new List<TensorData>(); list.Add(new TensorData() { input = null }); IEnumerable<TensorData> enumerableData = list; var dv = mlContext.Data.LoadFromEnumerable<TensorData>(list);//TensorData ITransformer model = pipelineX.Fit(dv); Directory.CreateDirectory("Model"); mlContext.Model.Save(model, inputchema, "trainedModelEAST3.zip");

At this point everything seems to work, but here is my problem with the outputs

In OpenCV I load an Image and use this cv::dnn::blobFromImage(frame, blob, 1.0, cv::Size(inpWidth, inpHeight), cv::Scalar(123.68, 116.78, 103.94), true, false); and only using this

` detector.setInput(blob); tickMeter.start(); detector.forward(outs, outNames); tickMeter.stop();

cv::Mat scores = outs[0];
cv::Mat geometry = outs[1];`

It's almost done, my inputs are clear, and my outputs too. But ML.net you need to create a class to hold the sample tensor data. So I did that

` public class TensorData { [VectorType(imageHeight, imageWidth, numChannels)] [ColumnName("input_images")] public float[] input { get; set; }

        [ColumnName("ImagePath")]
        public string imageP { get; set; }
        [ColumnName("Name")]
        public string imageN { get; set; }
    }`

This is where my confusion began because I know that my input for this model should be like this

inputs

using this seems to work

[VectorType(imageHeight, imageWidth, numChannels)] [ColumnName("input_images")] public float[] input { get; set; }

But for my outputs and how to pass and image to the model I only guessing. so using the information about the model's output that I find using Netron:

This is the "scores" outputs1

and this is the "geometry" (the box that show you where is a word in the image) outputs2

I create the class

` class OutputScores { [ColumnName("feature_fusion/concat_3")] public float[] output { get; set; }

        [ColumnName("feature_fusion/Conv_7/Sigmoid")]
        public float[] output2 { get; set; }

    }`

white all that I tried to use the predict engine like this using an image ("jpg"):

` Bitmap bitmapImage = (Bitmap)Image.FromFile(_predictSingleImage);

        float[] a = new float[(bitmapImage.Height * bitmapImage.Width) * 3];
        Color[] c = new Color[bitmapImage.Height * bitmapImage.Width];
        for (int i = 0; i < bitmapImage.Height * bitmapImage.Width; i++)
        {
            int row = i / bitmapImage.Width;
            int col = i % bitmapImage.Width;
            var pixel = bitmapImage.GetPixel(col, row);

            c[i] = pixel;
            //a[i + 0] = pixel.ToArgb();
            a[i * 3 + 0] = pixel.R;
            a[i * 3 + 1] = pixel.G;
            a[i * 3 + 2] = pixel.B;
        }
        var aux = c.ToArray();

        TensorData imageTensorData = new TensorData()
        {
            input = a.ToArray()
        };

        PredictionEngine<TensorData, OutputScores> _predictionEngineX;
        var loadedModelX = mlContex.Model.Load("trainedModelEAST3.zip", out _);
        _predictionEngineX = mlContex.Model.CreatePredictionEngine<TensorData, OutputScores>(loadedModelX);
        var predictionX = _predictionEngineX.Predict(imageTensorData);
        `

that gave this results:

For the "geometry"

For the scores:

Well that is how far I went. Could some one tell me If I implemented the loading of Image correctly or not. My end goal is to have the same or similar result as in OpenCV

this are the packages I am using:

Packages

and yes I tried this to create a pipeline:

`var imagesDataFile = @"....\DNN_ML_CUDA_01\assets\imagesText\";

        var data = mlContext.Data.CreateTextLoader(new TextLoader.Options()
        {
            Columns = new[]
            {
                    new TextLoader.Column("ImagePath", DataKind.String, 0),
                    new TextLoader.Column("Name", DataKind.String, 1),
                    new TextLoader.Column("input_images", DataKind.Single , 2),
            }
        }).Load(imagesDataFile);

        var imagesFolder = Path.GetDirectoryName(imagesDataFile);
        // Image loading pipeline. 
        var pipelineI = mlContext.Transforms.LoadImages("ImageObject",
            imagesFolder, "ImagePath")
            .Append(mlContext.Transforms.ResizeImages("ImageObjectResized",
                inputColumnName: "ImageObject", imageWidth: imageWidth, imageHeight: imageHeight))
            .Append(mlContext.Transforms.ExtractPixels("Pixels",
                "ImageObjectResized"))
            .Append(mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel)
                          .ScoreTensorFlowModel(
                                 outputColumnNames: new[] { "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3" },
                                 inputColumnNames: new[] { "input_images" },
                                 addBatchDimensionInput: false))
            ;

        List<TensorData> list = new List<TensorData>();
        list.Add(new TensorData() { input = null });
        IEnumerable<TensorData> enumerableData = list;
        var dvv = mlContext.Data.LoadFromEnumerable<TensorData>(list);//TensorData

        var model = pipelineI.Fit(dvv);

        using var modelX = mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel);
        var testeschema1 = modelX.GetInputSchema();

        Directory.CreateDirectory("Model");
        mlContext.Model.Save(model, testeschema1, "trainedModelEAST3.zip");

` It gave me the same results

for reference these are the websites that I use for this project:

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.loadimages?view=ml-dotnet

https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md#how-do-i-train-my-model-on-categorical-data

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.extractpixels?view=ml-dotnet

https://devblogs.microsoft.com/cesardelatorre/run-with-ml-net-c-code-a-tensorflow-model-exported-from-azure-cognitive-services-custom-vision/

https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/

https://devblogs.microsoft.com/cesardelatorre/training-image-classification-recognition-models-based-on-deep-learning-transfer-learning-with-ml-net/

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.transforms.tensorflowmodel.scoretensorflowmodel?view=ml-dotnet

https://github.com/dotnet/machinelearning/issues/5286

https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_TensorFlow

If somebody could show me an example, of guide me or anything that would be great.

mstfbl commented 4 years ago

Hi @sereal96 , can you please zip your entire VS project and share your whole code and data with us here? It is hard to follow the code snippets you've provided in your issue. Thanks!

sereal96 commented 4 years ago

Hi Mustafa, and tanks for answer so soon, here this is my test project

https://drive.google.com/file/d/14Uou9e3PBbCv8Z5kIm8ftqHs4MvtHTH6/view?usp=sharing

I hope it helps you.

Lynx1820 commented 4 years ago

Hi @sereal96,

You loaded the images correctly. A good way to check whether your tensorflow model is correctly implemented in ML.NET is to compare the results, which I believe you have done. The question marks mean that the model did not specify the images' first three dimensions, which are commonly batch size, height and width, presumably so you can decide those.

Here is another file with image loading examples: ImagesTests

As for the output, it seems like at the end you get some feature vector. The way to score that is not provided in the tensorflow model, so that's up to you to decide within ML.NET.

mstfbl commented 4 years ago

Hi @sereal96,

I'm closing this issue as @Lynx1820 has answered your inquiry. Please feel free to comment if you have additional questions. Thanks!