My confusion trying to use EAST text detector model with the ML.net

sereal96 commented 4 years ago

System information

OS version/distro:Windows 10:

Issue

What did you do? Hi, well I am trying to use a the EAST text detector model with the ML.net from here:

https://www.kaggle.com/yelmurat/frozen-east-text-detection

however I don't know if I am doing it the right way. (I only began using ML.net last month) First I tried using OpenCV with this example:

https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.cpp

It runs fine everything is Ok, but when I tried to do the same with ML.net...

What happened? The problem is that I dont understand how ML.net handle the input data, and the output data. I had an idea. I run other examples, but I couldn't find something similar.
What did you expect? I was expecting to have the same or at least similar results like those from OpenCV example.

Source code / logs

First I define this

` static readonly string _assetsPath = Path.Combine(Environment.CurrentDirectory, "assets"); static readonly string _imagesFolder = Path.Combine(_assetsPath, "imagesText");
static readonly string _predictSingleImage = Path.Combine(_imagesFolder, "page10.jpg"); static readonly string _inceptionTensorFlowModel = Path.Combine(_assetsPath, "models","frozen_east_text_detection.pb");

    private const int imageHeight = 3104;// 576;  It should be multiple by 32
    private const int imageWidth  = 2304; //576;  It should be multiple by 32
    private const int numChannels = 3;
    private const int inputSize = imageHeight * imageWidth * numChannels;`

then I load the TensorFlow model and saved as ML.net model

using var modelX = mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel); var schema = modelX.GetModelSchema(); var inputchema = modelX.GetInputSchema(); var pipelineX = modelX.ScoreTensorFlowModel( outputColumnNames: new[] { "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3" }, nameof(OutputScores.output) }, inputColumnNames: new[] { "input_images" }, addBatchDimensionInput: false); }, addBatchDimensionInput: true); List<TensorData> list = new List<TensorData>(); list.Add(new TensorData() { input = null }); IEnumerable<TensorData> enumerableData = list; var dv = mlContext.Data.LoadFromEnumerable<TensorData>(list);//TensorData ITransformer model = pipelineX.Fit(dv); Directory.CreateDirectory("Model"); mlContext.Model.Save(model, inputchema, "trainedModelEAST3.zip");

At this point everything seems to work, but here is my problem with the outputs

In OpenCV I load an Image and use this cv::dnn::blobFromImage(frame, blob, 1.0, cv::Size(inpWidth, inpHeight), cv::Scalar(123.68, 116.78, 103.94), true, false); and only using this

` detector.setInput(blob); tickMeter.start(); detector.forward(outs, outNames); tickMeter.stop();

cv::Mat scores = outs[0];
cv::Mat geometry = outs[1];`

It's almost done, my inputs are clear, and my outputs too. But ML.net you need to create a class to hold the sample tensor data. So I did that

` public class TensorData { [VectorType(imageHeight, imageWidth, numChannels)] [ColumnName("input_images")] public float[] input { get; set; }

        [ColumnName("ImagePath")]
        public string imageP { get; set; }
        [ColumnName("Name")]
        public string imageN { get; set; }
    }`

This is where my confusion began because I know that my input for this model should be like this

inputs

using this seems to work

[VectorType(imageHeight, imageWidth, numChannels)] [ColumnName("input_images")] public float[] input { get; set; }

But for my outputs and how to pass and image to the model I only guessing. so using the information about the model's output that I find using Netron:

This is the "scores" outputs1

and this is the "geometry" (the box that show you where is a word in the image) outputs2

I create the class

` class OutputScores { [ColumnName("feature_fusion/concat_3")] public float[] output { get; set; }

        [ColumnName("feature_fusion/Conv_7/Sigmoid")]
        public float[] output2 { get; set; }

    }`

white all that I tried to use the predict engine like this using an image ("jpg"):

` Bitmap bitmapImage = (Bitmap)Image.FromFile(_predictSingleImage);

        float[] a = new float[(bitmapImage.Height * bitmapImage.Width) * 3];
        Color[] c = new Color[bitmapImage.Height * bitmapImage.Width];
        for (int i = 0; i < bitmapImage.Height * bitmapImage.Width; i++)
        {
            int row = i / bitmapImage.Width;
            int col = i % bitmapImage.Width;
            var pixel = bitmapImage.GetPixel(col, row);

            c[i] = pixel;
            //a[i + 0] = pixel.ToArgb();
            a[i * 3 + 0] = pixel.R;
            a[i * 3 + 1] = pixel.G;
            a[i * 3 + 2] = pixel.B;
        }
        var aux = c.ToArray();

        TensorData imageTensorData = new TensorData()
        {
            input = a.ToArray()
        };

        PredictionEngine<TensorData, OutputScores> _predictionEngineX;
        var loadedModelX = mlContex.Model.Load("trainedModelEAST3.zip", out _);
        _predictionEngineX = mlContex.Model.CreatePredictionEngine<TensorData, OutputScores>(loadedModelX);
        var predictionX = _predictionEngineX.Predict(imageTensorData);
        `

that gave this results:

For the "geometry"

output {float[2234880]} float[] [0] 164.553131 float [1] 108.803284 float [2] 88.53912 float [3] 157.4754 float [4] -0.00642232737 float [5] 121.783844 float [6] 93.6575 float [7] 89.14729 float [8] 149.1378 float [9] 0.003307178 float [10] 143.044312 float [11] 92.95393 float [12] 93.75145 float [13] 136.486084 float [14] -0.00365050742 float [15] 150.783173 float [16] 105.081482 float [17] 104.515717 float [18] 138.529785 float [19] 0.00163079088 float [20] 155.030853 float

For the scores:

output2 {float[446976]} float[] [0] 5.96046448E-08 float [1] 2.38418579E-07 float [2] 2.38418579E-07 float [3] 4.76837158E-07 float [4] 2.682209E-07 float [5] 1.49011612E-07 float [6] 3.27825546E-07 float [7] 5.662441E-07 float [8] 3.27825546E-07 float [9] 5.066395E-07 float [10] 1.10268593E-06 float [11] 1.10268593E-06 float [12] 1.22189522E-06 float [13] 1.10268593E-06 float [14] 6.854534E-07 float [15] 4.76837158E-07 float [16] 2.682209E-07 float [17] 2.682209E-07 float [18] 1.49011612E-07 float [19] 2.38418579E-07 float [20] 1.49011612E-07 float

Well that is how far I went. Could some one tell me If I implemented the loading of Image correctly or not. My end goal is to have the same or similar result as in OpenCV

this are the packages I am using:

Packages

and yes I tried this to create a pipeline:

`var imagesDataFile = @"....\DNN_ML_CUDA_01\assets\imagesText\";

        var data = mlContext.Data.CreateTextLoader(new TextLoader.Options()
        {
            Columns = new[]
            {
                    new TextLoader.Column("ImagePath", DataKind.String, 0),
                    new TextLoader.Column("Name", DataKind.String, 1),
                    new TextLoader.Column("input_images", DataKind.Single , 2),
            }
        }).Load(imagesDataFile);

        var imagesFolder = Path.GetDirectoryName(imagesDataFile);
        // Image loading pipeline. 
        var pipelineI = mlContext.Transforms.LoadImages("ImageObject",
            imagesFolder, "ImagePath")
            .Append(mlContext.Transforms.ResizeImages("ImageObjectResized",
                inputColumnName: "ImageObject", imageWidth: imageWidth, imageHeight: imageHeight))
            .Append(mlContext.Transforms.ExtractPixels("Pixels",
                "ImageObjectResized"))
            .Append(mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel)
                          .ScoreTensorFlowModel(
                                 outputColumnNames: new[] { "feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3" },
                                 inputColumnNames: new[] { "input_images" },
                                 addBatchDimensionInput: false))
            ;

        List<TensorData> list = new List<TensorData>();
        list.Add(new TensorData() { input = null });
        IEnumerable<TensorData> enumerableData = list;
        var dvv = mlContext.Data.LoadFromEnumerable<TensorData>(list);//TensorData

        var model = pipelineI.Fit(dvv);

        using var modelX = mlContext.Model.LoadTensorFlowModel(_inceptionTensorFlowModel);
        var testeschema1 = modelX.GetInputSchema();

        Directory.CreateDirectory("Model");
        mlContext.Model.Save(model, testeschema1, "trainedModelEAST3.zip");

` It gave me the same results

for reference these are the websites that I use for this project:

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.loadimages?view=ml-dotnet

https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md#how-do-i-train-my-model-on-categorical-data

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.imageestimatorscatalog.extractpixels?view=ml-dotnet

https://devblogs.microsoft.com/cesardelatorre/run-with-ml-net-c-code-a-tensorflow-model-exported-from-azure-cognitive-services-custom-vision/

https://www.pyimagesearch.com/2018/08/20/opencv-text-detection-east-text-detector/

https://devblogs.microsoft.com/cesardelatorre/training-image-classification-recognition-models-based-on-deep-learning-transfer-learning-with-ml-net/

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.transforms.tensorflowmodel.scoretensorflowmodel?view=ml-dotnet

https://github.com/dotnet/machinelearning/issues/5286

https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_TensorFlow

If somebody could show me an example, of guide me or anything that would be great.

mstfbl commented 4 years ago

Hi @sereal96 , can you please zip your entire VS project and share your whole code and data with us here? It is hard to follow the code snippets you've provided in your issue. Thanks!

sereal96 commented 4 years ago

Hi Mustafa, and tanks for answer so soon, here this is my test project

https://drive.google.com/file/d/14Uou9e3PBbCv8Z5kIm8ftqHs4MvtHTH6/view?usp=sharing

I hope it helps you.

Lynx1820 commented 4 years ago

Hi @sereal96,

You loaded the images correctly. A good way to check whether your tensorflow model is correctly implemented in ML.NET is to compare the results, which I believe you have done. The question marks mean that the model did not specify the images' first three dimensions, which are commonly batch size, height and width, presumably so you can decide those.

Here is another file with image loading examples: ImagesTests

As for the output, it seems like at the end you get some feature vector. The way to score that is not provided in the tensorflow model, so that's up to you to decide within ML.NET.

mstfbl commented 4 years ago

Hi @sereal96,

I'm closing this issue as @Lynx1820 has answered your inquiry. Please feel free to comment if you have additional questions. Thanks!

dotnet / machinelearning