dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

Using ScoreTensorFlow model is a bit confusing #2165

Closed JakeRadMSFT closed 4 years ago

JakeRadMSFT commented 5 years ago

System information

Issue

Source code / logs

            var loader = new TextLoader(mlContext,
                new TextLoader.Arguments
                {
                    Column = new[] {
                        new TextLoader.Column("ImagePath", DataKind.Text, 0),
                    }
                });

            // Why is this needed? It works fine with the MultiFileSource being null. There shouldn't need to be training data when loading a pre-trained model.
            var data = loader.Read(new MultiFileSource(null));

            var pipeline = mlContext.Transforms.LoadImages(imageFolder: imageFolderPath, columns: ("ImagePath", "ImageReal"))
                            .Append(mlContext.Transforms.Resize("ImageReal", "ImageReal", ImagePreprocessSettings.imageHeight, ImagePreprocessSettings.imageWidth))
                            .Append(mlContext.Transforms.ExtractPixels(new[] { new ImagePixelExtractorTransform.ColumnInfo("ImageReal", TensorFlowModelSettings.InputTensorName, interleave: ImagePreprocessSettings.channelsLast, offset: ImagePreprocessSettings.mean) }))
                            .Append(mlContext.Transforms.ScoreTensorFlowModel(modelFilePath, new[] { TensorFlowModelSettings.InputTensorName }, new[] { TensorFlowModelSettings.OuputTensorName }));

            // What am I "fitting" and why am I passing "data"?
            var modeld = pipeline.Fit(data);

            var predictionFunction = modeld.MakePredictionFunction<TrainTestData, PredictionProbability>(mlContext);

            return predictionFunction;
Ivanidzo4ka commented 5 years ago

So we have concept of IEstimator and ITransformer. see https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetHighLevelConcepts.md In general case you create pipeline out of chaining IEstimators together, then you fit them, and that produce TransformerChain. Since all your estimators are trivial (doesn't require pass through data to create internal state) you can chain them together without any fitting.

 var model = new ImageLoaderTransformer(mlContext, imageFolder: imageFolderPath, columns: ("ImagePath", "ImageReal"))
            .Append(new ImageResizerTransformer(mlContext, "ImageReal", "ImageReal", ImagePreprocessSettings.imageHeight, ImagePreprocessSettings.imageWidth))
            .Append(new ImagePixelExtractorTransformer(mlContext, new[] { new ImagePixelExtractorTransformer.ColumnInfo("ImageReal", TensorFlowModelSettings.InputTensorName, interleave: ImagePreprocessSettings.channelsLast, offset: ImagePreprocessSettings.mean) }))
            .Append(new TensorFlowTransformer(mlContext, modelFilePath, new[] { TensorFlowModelSettings.InputTensorName }, new[] { TensorFlowModelSettings.OuputTensorName }));

But you can do that trick only if all your pipeline estimators independent from data. (One of the easiest ways to tell is to check your estimator and what class it will return in case of fit and if you can see public constructor for that class - it doesn't required training).

Ivanidzo4ka commented 5 years ago

Sorry for delay in response, hope I answer your question.

JakeRadMSFT commented 5 years ago

Yep! I couldn't get my code to format correctly to post the final solution here but it's essentially what you have above but switching input/output column locations. I also had to update to 0.10.0.

I also had to call model.CreatePredictionEngine<TrainTestData, PredictionProbability>(mlContext); to get the equivalent predict function.

Thanks!

Ivanidzo4ka commented 5 years ago

One thing worth to mention, we currently in process of hiding as much stuff as possible and in 0.11 you wont be able to do this trick since Transformer constructor will stop being public. We aware what it's weird to call estimators on top of empty file, and we will address it, there is reference to proposal how to do that above. We just in a state if we expose too much, it can be nightmare to support later, so we trying to hide pretty much everything from user.

JakeRadMSFT commented 5 years ago

I'm not fully following. So what will it become? I find this to be fairly intuitive.

var model = new ImageLoaderTransformer(mlContext, imageFolder: imageFolderPath, columns: ("ImageReal", "ImagePath"))
            .Append(new ImageResizerTransformer(mlContext, "ImageReal", ImagePreprocessSettings.imageWidth, ImagePreprocessSettings.imageHeight, "ImageReal"))
            .Append(new ImagePixelExtractorTransformer(mlContext, new[] { new ImagePixelExtractorTransformer.ColumnInfo(TensorFlowModelSettings.InputTensorName, "ImageReal", interleave: ImagePreprocessSettings.channelsLast, offset: ImagePreprocessSettings.mean) }))
            .Append(new TensorFlowTransformer(mlContext, modelFilePath, new[] { TensorFlowModelSettings.OuputTensorName }, new[] { TensorFlowModelSettings.InputTensorName }));   

            return model.CreatePredictionEngine<TrainTestData, PredictionProbability>(mlContext);
JakeRadMSFT commented 5 years ago

Re-opening to continue the discussion :)

Ivanidzo4ka commented 5 years ago

https://github.com/dotnet/machinelearning/issues/1798#issuecomment-458845661 So we have this issue/discussion where we decided to hide all ctors for transformers and estimators.

From what I understand (I wasn't part of discussion) we have mlContext object and we want user instead of going through namespaces and documentation and search engine just call for mlContext.Transforms and show all available estimators for him.

Somehow we also decided what any other way to construct estimators is should be hidden, and same goes to trivial (not required training) transformers. I wasn't part of discussion, so I don't know reasoning behind it, but I'm sure it exist.

We have this issue: https://github.com/dotnet/machinelearning/issues/2354, but I'm not sure it will save you from fake Fit call. At least I don't see that in issue.

JakeRadMSFT commented 5 years ago

Any updates on this?

@Ivanidzo4ka, have you heard/seen anything?

nfnpmc commented 5 years ago

What is it supposed to be like in 1.1.0?

harishsk commented 4 years ago

@JakeRadMSFT and @nfnpmc I am not not sure I follow your latest questions but it appears the previous questions have been answered. I am closing the issue for now. Please reopen with details if you need more info.