dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

FastForestBinaryClassifier always return same prediction #589

Closed bzn7 closed 6 years ago

bzn7 commented 6 years ago

System information

Issue

Source code / logs

class MLData
    {
        public class IrisPrediction
        {
            [ColumnName("PredictedLabel")]
            public bool PredictedLabels;
        }

        public class IrisData
        {
            [Column("0", name: "Label")] public bool Label;
            [Column("1")] [VectorType(1000)] public float[] param1;
            [Column("2")] [VectorType(1000)] public float[] param2;
            [Column("3")] [VectorType(1000)] public float[] param3;
            [Column("4")] [VectorType(1000)] public float[] param4;
            [Column("5")] [VectorType(1000)] public float[] param5;
            [Column("6")] [VectorType(1000)] public float[] param6;
        }

        public static List<IrisData> History = new List<IrisData>() { };
    }
class MLCore
    {
        private static string AppPath => Path.GetDirectoryName(Environment.GetCommandLineArgs()[0]);
        private static string ModelPath => Path.Combine(AppPath, "IrisModel.zip");
        private static PredictionModel<MLData.IrisData, MLData.IrisPrediction> readyModel;

        internal static async Task<PredictionModel<MLData.IrisData, MLData.IrisPrediction>> TrainAsync()
        {
            var data = MLData.History;
            var collection = CollectionDataSource.Create(data);

            var pipeline = new LearningPipeline()
            {
                collection,
                new ColumnConcatenator("Features", "param1","param2", "param3","param4", "param5", "param6"),

                new FastForestBinaryClassifier(),

                new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" }
            };

            PredictionModel<MLData.IrisData, MLData.IrisPrediction> model;

            try
            {
                model = pipeline.Train<MLData.IrisData, MLData.IrisPrediction>();
                await model.WriteAsync(ModelPath);
                PGlobals.learnSuccesfull = true;
            }
            catch (Exception e)
            {
                model = null;
                PGlobals.learnSuccesfull = false;
            }

            return model;
        }

        public static async void Learn()
        {
            readyModel = await TrainAsync();
        }

        public static void Think()
        {
            if (readyModel != null)
            {
                try
                {
                    var prediction = readyModel.Predict(new MLData.IrisData()
                    {
                        param1 = PGlobals.param1,
                        param2 = PGlobals.param2,
                        param3 = PGlobals.param3,
                        param4 = PGlobals.param4,
                        param5 = PGlobals.param5,
                        param6 = PGlobals.param6
                    });

                    PGlobals.predictedResult = prediction.PredictedLabels;
                }

                catch
                {
                    //Nothing
                }
            }
        }
    }
Zruty0 commented 6 years ago

Do you actually have 6000 features? If yes, your [Column] should look like this:

            [Column("1-1000")] [VectorType(1000)] public float[] param1;
            [Column("1001-2000")] [VectorType(1000)] public float[] param2;
            // etc.
WladdGorshenin commented 6 years ago

Have the same issue (ML.NET v0.4). The classifier returns the same prediction (false).

Regarding the post by @bzn7 I consider it makes no difference how many features do a training set have - the answers must be different.

Zruty0 commented 6 years ago

Well, if you have 6000 features, but you read them the way @bzn7 does (994 features appear 6 times each), the learner is going to be severely hampered. My guess was that the model that was learned was trivial, and therefore gave the same prediction all the time.

I think you are incorrect about this one:

I consider it makes no difference how many features do a training set have - the answers must be different.

I would say that if the answers are 'the same all the time', it is unfortunate, but far from uncommon. Here are some factors that can potentially cause this:

Ivanidzo4ka commented 6 years ago

DRI RESPONSE: I'm considering this question as answered and intent to close issue within next few days, unless someone have objection.

bzn7 commented 6 years ago

Do you actually have 6000 features? If yes, your [Column] should look like this:

            [Column("1-1000")] [VectorType(1000)] public float[] param1;
            [Column("1001-2000")] [VectorType(1000)] public float[] param2;
            // etc.

I tried to simplify my features and redefined columns as shown. It is working, thank you @Zruty0.