dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.03k stars 1.88k forks source link

Query about outdated Unity Sample #6524

Closed Vivraan closed 1 year ago

Vivraan commented 1 year ago

Copy of https://github.com/dotnet/machinelearning-samples/issues/981

System information

Issue

I am trying to adapt some code in the Unity sample in this project in my own project and using my own models from here: https://github.com/dotnet/machinelearning-samples/blob/main/samples/csharp/end-to-end-apps/Unity-HelloMLNET/HelloMLNET/Assets/Scenes/HelloML.cs

However, it seems that the ArrayDataViewBuilder class is now an internal sealed class, and it cannot be accessed by external classes. This was documented as a workaround for PredictionEngine since it made use of Reflection.Emit which apparently "throws up": https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/end-to-end-apps/Unity-HelloMLNET#known-workarounds

What is a solution for this? I am in a position where I absolutely need a solution to this in the next 72 hours.

Source code / logs

Will attach if necessary.

luisquintanilla commented 1 year ago

@Vivraan it looks like what that code is doing is using the Transform which takes in an IDataView instead of the PredictionEnginePool method which takes an instance of an object.

This might be helpful - https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/machine-learning-model-predictions-ml-net#multiple-predictions-idataview

More or less this is what you'd need to do:

// Initialize MLContext
var ctx = new MLContext();

// Create IDataView with a single instance
var dv = ctx.Data.LoadFromEnumerable(new [] { new { Text="YOUR TEXT" }});

// Call Transform using your model to make predictions
var predictions = model.Transform(dv);

// Get prediction
var predictedValue = predictions.GetColumn<string>("PredictedLabel").First().ToArray();
Vivraan commented 1 year ago

@Vivraan it looks like what that code is doing is using the Transform which takes in an IDataView instead of the PredictionEnginePool method which takes an instance of an object.

This might be helpful - https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/machine-learning-model-predictions-ml-net#multiple-predictions-idataview

More or less this is what you'd need to do:

// Initialize MLContext
var ctx = new MLContext();

// Create IDataView with a single instance
var dv = ctx.Data.LoadFromEnumerable(new [] { new { Text="YOUR TEXT" }});

// Call Transform using your model to make predictions
var predictions = model.Transform(dv);

// Get prediction
var predictedValue = predictions.GetColumn<string>("PredictedLabel").First().ToArray();

Can I replace the auto-generated input and output model classes? I'll have to use one of them anyway for passing into the load method.

luisquintanilla commented 1 year ago

@Vivraan

Can I replace the auto-generated input and output model classes?

I'm not sure which auto-generated input / output model classes you're referring to. Could you please clarify.

I'll have to use one of them anyway for passing into the load method.

If you're referring to loading the model, I don't think you need them. If you're referring to LoadFromEnumerable, I think so. In my sample, it assumes you're using newer versions of .NET with anonymous type support. In that case, LoadFromEnumerable automatically infers those types. However, since you're using .NET Framework my understanding is that won't work and you would need to provide the input class.

So assuming you have you have an input class:

public class ModelInput
{
    public string Text {get;set;}
}

Loading the data would look like the following:

var dv = ctx.Data.LoadFromEnumerable<ModelInput>(new ModelInput[] { new ModelInput{ Text="YOUR TEXT" }});
Vivraan commented 1 year ago

I'm not sure which auto-generated input / output model classes you're referring to. Could you please clarify.

I used AutoML to generate a class for training and consumption, and the internal model input and output classes have 882 separate fields, one for each float column from the original dataset.

#region model input class
        public class ModelInput
        {
            [ColumnName(@"col0")]
            public string Col0 { get; set; }

            [ColumnName(@"col1")]
            public float Col1 { get; set; }

            [ColumnName(@"col2")]
            public float Col2 { get; set; }

            [ColumnName(@"col3")]
            public float Col3 { get; set; }

            [ColumnName(@"col4")]
            public float Col4 { get; set; }
            ...

            [ColumnName(@"col880")]
            public float Col880 { get; set; }

            [ColumnName(@"col881")]
            public float Col881 { get; set; }

            [ColumnName(@"col882")]
            public float Col882 { get; set; }

        }

        #endregion

From my understanding, I can simply use [VectorType(N)] and specify an array field, which will correspond to those values, as follows:

public class ModelInput
{
    public string KnotType { get; set; } // can use my own name here

    [VectorType(881)]
    public float[] data;
}

However it seems Unity's version of the .NET Framework does support anonymous types, so maybe I could just directly pass in my float[] directly from my program into the model?

luisquintanilla commented 1 year ago

From my understanding, I can simply use [VectorType(N)] and specify an array field, which will correspond to those values, as follows:

Yup. That's right.

However, you might need to retrain since the original model was trained on the individual 800 columns.

You can't pass the float[] directly into the model because it would be expecting either the ColN columns or data vector column.

Vivraan commented 1 year ago

How will I make the necessary pipeline for that? The auto-generated pipeline maps columns by name:

/// <summary>
        /// build the pipeline that is used from model builder. Use this function to retrain model.
        /// </summary>
        /// <param name="mlContext"></param>
        /// <returns></returns>
        public static IEstimator<ITransformer> BuildPipeline(MLContext mlContext)
        {
            // Data process configuration with pipeline data transformations
            var pipeline = mlContext.Transforms.ReplaceMissingValues(new []{new InputOutputColumnPair(@"col1", @"col1"),new InputOutputColumnPair(@"col2", @"col2"),new InputOutputColumnPair(@"col3", @"col3"),...new InputOutputColumnPair(@"col880", @"col880"),new InputOutputColumnPair(@"col881", @"col881"),new InputOutputColumnPair(@"col882", @"col882")})      
                                    .Append(mlContext.Transforms.Concatenate(@"Features", new []{@"col1",@"col2",@"col3",...@"col880",@"col881",@"col882"}))      
                                    .Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName:@"col0",inputColumnName:@"col0"))      
                                    .Append(mlContext.Transforms.NormalizeMinMax(@"Features", @"Features"))      
                                    .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(new SdcaMaximumEntropyMulticlassTrainer.Options(){L1Regularization=1F,L2Regularization=0.1F,LabelColumnName=@"col0",FeatureColumnName=@"Features"}))      
                                    .Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName:@"PredictedLabel",inputColumnName:@"PredictedLabel"));

            return pipeline;
        }

Will it be possible to reuse the auto-generated classes or would I need to roll my own script to train the model? Clearly the output model doesn't need to have the extra fields, since all of those have been concatenated into the Features column.

EDIT: I am currently using my own custom script with AutoML's DLL for training a model.

EDIT 2: I am considering training a neural network with ML>NET externally since ML.NET does not support ARM64 on Unity, which is necessary for Android builds. Switching to ONNX through ML.NET and using the Barracuda framework instead.

luisquintanilla commented 1 year ago

You can use the generated code, but you'll have to modify it to fit your class which loads the input vectors directly.

Assuming your class looks like:

public class ModelInput
{
    [VectorType(881)]
    [LoadColumn(1,882)]
    public float[] Features {get;set;}

    [LoadColumn(0)
    public float Label {get;set;}
}

The pipeline would look something like:

        public static IEstimator<ITransformer> BuildPipeline(MLContext mlContext)
        {
            // Data process configuration with pipeline data transformations
            var pipeline = mlContext.Transforms.ReplaceMissingValues("Features")     
                                    .Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName:@"Label",inputColumnName:@"Label"))      
                                    .Append(mlContext.Transforms.NormalizeMinMax(@"Features", @"Features"))      
                                    .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(new SdcaMaximumEntropyMulticlassTrainer.Options(){L1Regularization=1F,L2Regularization=0.1F,LabelColumnName=@"col0",FeatureColumnName=@"Features"}))      
                                    .Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName:@"PredictedLabel",inputColumnName:@"PredictedLabel"));

            return pipeline;
        }
Vivraan commented 1 year ago

Is it possible to train a neural network using ML.NET? The examples I am seeing from https://learn.microsoft.com/en-us/dotnet/machine-learning/deep-learning-overview#train-custom-models don't involve creating my own layers or anything of the sort.

I am short on time so maybe I can convert my vectors into 1-pixel-wide greyscale images with their float value mapped to the pixel colour?

luisquintanilla commented 1 year ago

Is it possible to train a neural network using ML.NET?

It depends on what you're trying to do. Here are the custom training scenarios it supports.

https://learn.microsoft.com/en-us/dotnet/machine-learning/deep-learning-overview#train-custom-models

These use transfer-learning / fine-tuning.

If you want to train a custom neural net from scratch, TorchSharp, TensorFlow.NET and DiffSharp are good options.

Vivraan commented 1 year ago

Just built for the Windows platform and got this:

PlatformNotSupportedException: Operation is not supported on this platform.
  at System.Reflection.Emit.DynamicMethod..ctor (System.String name, System.Type returnType, System.Type[] parameterTypes, System.Type owner, System.Boolean skipVisibility) [0x00000] in <00000000000000000000000000000000>:0 
  at Microsoft.ML.ApiUtils.GeneratePeek[TOwn,TRow,TValue] (System.Reflection.PropertyInfo propertyInfo, System.Reflection.Emit.OpCode assignmentOpCode) [0x00000] in <00000000000000000000000000000000>:0 
  at System.Reflection.RuntimeMethodInfo.Invoke (System.Object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00000] in <00000000000000000000000000000000>:0 
  at System.Reflection.MethodBase.Invoke (System.Object obj, System.Object[] parameters) [0x00000] in <00000000000000000000000000000000>:0 
  at Microsoft.ML.Internal.Utilities.Utils.MarshalInvoke[TArg1,TArg2,TResult] (Microsoft.ML.Internal.Utilities.FuncStaticMethodInfo3`3[T1,T2,TResult] func, System.Type genArg1, System.Type genArg2, System.Type genArg3, TArg1 arg1, TArg2 arg2) [0x00000] in <00000000000000000000000000000000>:0 
  at Microsoft.ML.ApiUtils.GeneratePeek[TOwn,TRow] (Microsoft.ML.Data.InternalSchemaDefinition+Column column) [0x00000] in <00000000000000000000000000000000>:0 
  at Microsoft.ML.Data.DataViewConstructionUtils+DataViewBase`1[TRow]..ctor (Microsoft.ML.Runtime.IHostEnvironment env, System.String name, Microsoft.ML.Data.InternalSchemaDefinition schemaDefn) [0x00000] in <00000000000000000000000000000000>:0 
  at Microsoft.ML.Data.DataViewConstructionUtils+StreamingDataView`1[TRow]..ctor (Microsoft.ML.Runtime.IHostEnvironment env, System.Collections.Generic.IEnumerable`1[T] data, Microsoft.ML.Data.InternalSchemaDefinition schemaDefn) [0x00000] in <00000000000000000000000000000000>:0 
  at Microsoft.ML.Data.DataViewConstructionUtils.CreateFromEnumerable[TRow] (Microsoft.ML.Runtime.IHostEnvironment env, System.Collections.Generic.IEnumerable`1[T] data, Microsoft.ML.Data.SchemaDefinition schemaDefinition) [0x00000] in <00000000000000000000000000000000>:0 
  at Microsoft.ML.DataOperationsCatalog.LoadFromEnumerable[TRow] (System.Collections.Generic.IEnumerable`1[T] data, Microsoft.ML.Data.SchemaDefinition schemaDefinition) [0x00000] in <00000000000000000000000000000000>:0 
  at RopeKnotsProject.ML.ModelConsumer.Update () [0x0002d] in C:\x\KU\RopeKnotsProject\Assets\RopeKnotsProject\Scripts\ML\ModelConsumer.cs:63 
Rethrow as TargetInvocationException: Exception has been thrown by the target of an invocation.

It was working perfectly in the Unity Editor, so unsure about this.

EDIT: using the Mono scripting backend instead of IL2CPP solves this.

michaelgsharp commented 1 year ago

Gonna close this for now since it looks like its resolved. Please feel free to open another issue if something else comes up.