dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Provided label column 'Sentiment' was of type String, but only type Single is allowed. #804

Closed maxsklad closed 3 years ago

maxsklad commented 4 years ago

Problem encountered on https://dotnet.microsoft.com/learn/ml-dotnet/get-started-tutorial/train Operating System: windows

I'm trying to master machine learning. When I start training the model, the following error occurs: Provided label column 'Sentiment' was of type String, but only type Single is allowed. I ask for your help. I did everything according to the tutorial, but I got an error as a result.

JakeRadMSFT commented 4 years ago

@maxsklad Sorry you're hitting this issue! Can you share what dataset you were using and what scenario type you chose?

linhnle commented 4 years ago

I got the same problem, here is my data Date,SMA,EMA,MACD,ClosePrice,Label 20070906,114.25,114.866667,-22.22151,111,1 20070907,113.25,114.498413,-19.88974,116,1 20070910,112.2,114.641421,-17.437329,120,1 20070911,111.25,115.151762,-14.998121,115,-1 20070912,109.95,115.137308,-13.315002,112,-1 20070913,108.5,114.838517,-12.083897,113,1 20070914,107.35,114.66342,-10.901876,114,1 20070917,106.4,114.600237,-9.771782,113,-1 20070918,106.45,114.447834,-8.854793,112,-1 20070919,106.45,114.214707,-8.115217,112,0 20070920,106.35,114.003782,-7.443296,115,1 20070921,106.45,114.09866,-6.592722,113,-1 20070924,106.45,113.994026,-6.010731,114,1 20070925,106.55,113.994595,-5.406485,114,0 20070926,106.75,113.99511,-4.87146,116,1 20070927,107.1,114.186052,-4.237222,114,-1 20070928,107.35,114.168332,-3.851568,118,1 20071001,107.8,114.533253,-3.186438,123,1 20071002,108.55,115.33961,-2.23015,127,1 20071003,109.6,116.450123,-1.13642,126,-1 20071004,110.35,117.359635,-0.346329,122,-1 20071005,110.65,117.801575,-0.042454,125,1 20071008,110.9,118.487139,0.435424,123,-1 20071009,111.3,118.916935,0.645324,121,-1 20071010,111.75,119.115323,0.642878,122,1 20071011,112.2,119.390054,0.713407,121,-1 ...........................................................................................

JakeRadMSFT commented 4 years ago

@linhnle thanks for sharing your dataset! Were you trying to predict "Label" using all of the other values? Was the scenario type Value Prediction or Classification?

linhnle commented 4 years ago

@JakeRadMSFT Yes I was trying to predict "Label" using all of the other values and It was Value Prediction

linhnle commented 4 years ago

Here is the log 2020-06-12 00:52:17.6673 DEBUG Set log file path to C:\Users\USER\AppData\Local\Temp\MLVSTools\logs\b34c96f7-ac96-423a-9e08-46b266b504c2.txt (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 00:54:09.0159 DEBUG C:\ProgramData\Ck\R\RIC\training.csv (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 00:54:46.6483 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 00:54:49.0788 WARN GPU Service not found. Falling back to CPU AutoML Service. (Microsoft.ML.ModelBuilder.Utils.Logger.Warn) 2020-06-12 00:54:51.2540 INFO | Trainer RSquared Absolute-loss Squared-loss RMS-loss Duration #Iteration | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2020-06-12 00:54:51.6111 DEBUG Provided label column 'Label' was of type String, but only type Single is allowed. at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateTrainDataColumn(IDataView trainData, String columnName, String columnPurpose, IEnumerable1 allowedTypes) at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateColumnInformation(IDataView trainData, ColumnInformation columnInformation, TaskKind task) at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.<>cDisplayClass21_0.b_5() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 81 at System.Threading.Tasks.Task1.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.d21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 108 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLEngine.d30.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 147 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:05:00.0136 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:05:00.0136 DEBUG Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:05:00.0136 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:05:00.1652 INFO | Trainer RSquared Absolute-loss Squared-loss RMS-loss Duration #Iteration | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2020-06-12 01:05:00.1872 DEBUG Provided label column 'Label' was of type String, but only type Single is allowed. at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateTrainDataColumn(IDataView trainData, String columnName, String columnPurpose, IEnumerable1 allowedTypes) at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateColumnInformation(IDataView trainData, ColumnInformation columnInformation, TaskKind task) at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.<>c__DisplayClass21_0.b5() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 81 at System.Threading.Tasks.Task1.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.d21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 108 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLEngine.d_30.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 147 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:09:03.6671 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:09:03.6671 DEBUG Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:09:03.6671 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:09:03.6951 INFO | Trainer RSquared Absolute-loss Squared-loss RMS-loss Duration #Iteration | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2020-06-12 01:09:03.6951 DEBUG Provided label column 'Label' was of type String, but only type Single is allowed. at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateTrainDataColumn(IDataView trainData, String columnName, String columnPurpose, IEnumerable1 allowedTypes) at Microsoft.ML.AutoML.UserInputValidationUtil.ValidateColumnInformation(IDataView trainData, ColumnInformation columnInformation, TaskKind task) at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.<>cDisplayClass21_0.b5() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 81 at System.Threading.Tasks.Task1.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.d_21.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 108 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLEngine.d30.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 147 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:09:10.2725 DEBUG Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2020-06-12 01:09:21.7628 DEBUG Open Log FileC:\Users\USER\AppData\Local\Temp\MLVSTools\logs\b34c96f7-ac96-423a-9e08-46b266b504c2.txt (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)

JakeRadMSFT commented 4 years ago

Thanks @linhnle .. I'm able to reproduce the issue and I don't see anything odd in the data. @LittleLittleCloud Can you help investigate?

nlz242 commented 4 years ago

Same issue following the tutorial. Using the dataset linked at https://dotnet.microsoft.com/learn/ml-dotnet/get-started-tutorial/data I had to open the file in excel, format the sentiment column as Number with decimals, save it and then i was able to train. Not sure why ML.NET thinks integer are strings...

sphengle commented 3 years ago

I can't believe I've fallen at the first fence. My initial excitement at giving this a go has been dampened. I'm getting the same problem. I made sure my data (coming from a flat file) had decimals, not just integers, and it still says "Provided label column 'blah' was of type String. There's not a single value that looks like a string - there all numbers!!??

LittleLittleCloud commented 3 years ago

I can't believe I've fallen at the first fence. My initial excitement at giving this a go has been dampened. I'm getting the same problem. I made sure my data (coming from a flat file) had decimals, not just integers, and it still says "Provided label column 'blah' was of type String. There's not a single value that looks like a string - there all numbers!!??

Hi @sphengle, sorry for the trouble, Can we have a glance of your dataset and error log so we can look into it. Thanks

mahdisml commented 3 years ago

i had same problem

TitusRie commented 3 years ago

Hi @LittleLittleCloud, here are my logs, on my laptop everything went very smoothly, then doing the same on my desktop I encountered the error mentioned (Provided label column 'col1' was of type String, but only type Single is allowed.) I was following the tutorial, dataset was the yelp_labelled.txt, see attachment (downloaded as mentioned in https://dotnet.microsoft.com/learn/ml-dotnet/get-started-tutorial/data). I compared regional settings, there was a difference (Working setting on Laptop was Dutch (Netherlands)), changing the desktop from UK to Dutch unfortunately didn't help... yelp_labelled.txt

beccamc commented 3 years ago

Hi all. Sorry for the issue here and thank you for reporting. Can you share what version of Model Builder you are on?

  1. If you are not on 16.3.0.2056001 can you update here and try again?
  2. If you are on 16.3.0.2, does your UI look like this? image.png
Ansjosen commented 3 years ago

@beccamc Sorry for my ignorance, but where do one locate the Model Builder version id?

beccamc commented 3 years ago

@Ansjosen No worries! You go to Extensions -> Manage Extensions

image

Ansjosen commented 3 years ago

Thank you for your response, Im very grateful. 16.3.0.2056001 is exactly what I've got. When dowloading from your link, I get a message stating I allready got this module. Checking ID -still the same. Should the download-link match your version number 16.5.xxx or what? Nevertheless... I tried to run the Value predicter on my file (just 2 numeric columns) and got the same message as before: >>Provided label column 'col1' was of type String, but only type Single is allowed.<< I tried the trick with decimal values, but that doesnt work for me. Same message.

beccamc commented 3 years ago

@Ansjosen sorry for the confusion, I used a screenshot of a newer internal build - 16.3 is the latest public version :)

Can you confirm that the first page of your UI looks like this? This is the 16.3 UI: image.png

We've had an issue where despite saying it's version 16.3, it actually is not updated. We've seen this problem in other customer issues where we can't get a repro. If your UI looks like this, then you're on an older version:

image.png

Ansjosen commented 3 years ago

@beccam Thanks for your reply, I got the first one: (when rightclick, add, MachineLearning), NOT the failure machine billede

Ansjosen commented 3 years ago

@beccamc 11 first rows of the 2-column file

0 0 1 1 2 0 3 1 4 0 5 1 6 0 7 1 8 0 9 1 10 0

beccamc commented 3 years ago

@Ansjosen To confirm, you're using your own data, is that correct? If you copied pasted the above data exactly from the file, I'm guessing the problem is the delimiter. If you separate your data with a comma it should read those as two separate columns.

@ Others using the tutorial sentiment data Can you check if you are on version 16.3 with the updated UI? (instructions in this comment)

Ansjosen commented 3 years ago

16.3 it is. Its my own tiny file; now reduced to 10 rows, 0-9: 0,0 - 0,0 - 00 - 00 - 0;0 - 0;0 formats have been tried -all resulting in the infamous message:

Provided label column 'col1' was of type String, but only type Single is allowed.<< One thing though: No matter the formats above, the column-display looks accurate. When selecting the column to predict, the order of the columns are swapped. Maybe it serves a higher purpose with multiple columns? I dont know.

Ansjosen commented 3 years ago

Sorry, the formats have been git-formatted, I tried this: 0space0, 0komma0, 0kommaSpace0, 0tab0, 0semicolon0, 0semikolonSpace0

beccamc commented 3 years ago

@Ansjosen I tried with tabs, and it worked for me. Dataset I used - smallTestDataset.zip

image image

Can you upload a copy of your test file?

Ansjosen commented 3 years ago

@beccamc I get the same error using your file. As I mentioned earlier my columns in datapreview are swapped -compared to yours. Should it do so? Please elaborate. The swapping is the only difference I can spot. billede

As said; the preview do parse the different formats nicely. (have you tried other separators your self? It would probably work on your machine as well, I guess)

-so I dont think it will make much difference to upload a file, but I'll give you the benefit of doubt :-) EvenOdd.zip

Ansjosen commented 3 years ago

And I'm using Value Prediction Local ML

beccamc commented 3 years ago

@Ansjosen Can you try "Text Classification" instead?

Ansjosen commented 3 years ago

@beccam That was the first one I tried, and it worked like a charm; still does I reckon'?

Ansjosen commented 3 years ago

Wait, did you mean go ahead with value prediction and smallTestDataset -BUT use Text classification?

beccamc commented 3 years ago

@Ansjosen I'm confused. Were you able to train successfully using "Text classification"?

The problem you are trying to solve isn't really value prediction. Value prediction is more appropriate for something like predicting a price. In your dataset you are trying to predict if something is even or odd. I can see why it is confusing, you are predicting a value. But even though it's a number, the real goal is to classify the data into two groups.

It is not expected that this dataset can be trained using value prediction. I'd advise you to use text classification. Does training complete for the "Text classification" scenario?

Ansjosen commented 3 years ago

@beccamc Yes, a little confusing; maybe if I knew more and had been more clear on "I tried to run the Value predicter?" But then I would have missed our conversation -and you wouldn't have wasted your time. Sorry for that, but thanks for all yor help and not quitting on me.

ITS WORKING :-)

I tried the text classification and it works on the odd/even-data. I'm feeding one number after another and it seems like the prediction probability increases the more digits present in the number. (is that to be expected?)

I'd like to use it in a program. I tried this in Program.cs in the current solution (another Program.Main() is present in *ML.Model) static void Main() { Console.WriteLine("Hello World!"); ModelInput mi = new ModelInput(); mi.Col0 = 999; Console.WriteLine("Out:"); Console.WriteLine(mi.Col1); //returns nothing Console.ReadLine(); } Obviously -I'm not a shark. Can you point me in some dircection/documentation for trying to consume/work with the model in a console-app?

beccamc commented 3 years ago

@Ansjosen, no worries! I'm happy to help.

What's missing from the code is calling ConsumeModel.Predict(mi);

Can you go to the "Consume" tab and click "Add to solution." That will add a sample console project to your app. You should be able to see a sample in the ConsoleApp's Program.cs.

Let me know if you have questions.

Ansjosen commented 3 years ago

Thank you! :-) I'm not sure of the "Consume" - "Add to solution" -so I'm missing that. I added the code above and added the prediction

      static void Main()
    {
        Console.WriteLine("Hello World!");
        ModelInput mi = new ModelInput();
        mi.Col0 = 3;
        ConsumeModel.Predict(mi);    //@beccam
        Console.WriteLine(ConsumeModel.Predict(mi).Prediction);    //returns 0
        Console.ReadLine();
    }

I've attached an image of the solution explorer if thats the source(not sure if that source of the add-to-solution)

billede

beccamc commented 3 years ago

@Ansjosen take a look at the myMLAppValuePredML.ConsoleApp Program.cs file. It gives a full sample of predicting from sample data.

Ansjosen commented 3 years ago

@beccam I dont think text-classification predicts the value 0/1 that well, not that it was expected regarding the image: billede But perhaps thats whats to be expected? What would be a suitable (minimum requirement of columns/fields) dataset for predicting a price?

I adjusted the code you kindly pointed me to like below -but it predicts 0 consistently with probability decreasing smoothly from 71% downto 56%. Please notice that the numbers [0; 1000] the predicted values are based on, all are within the train-data interval [0; 2021]

    static void Main()
    {
        ModelInput sampleData = new ModelInput();
        for (int i = 0; i < 1001; i++)
        {
            sampleData.Col0 = i;
            var predictionResult = ConsumeModel.Predict(sampleData);
            Console.WriteLine($"{i}: {predictionResult.Prediction} Score: [{String.Join(",", predictionResult.Score)}]");
        }
        Console.WriteLine("=============== End of process, hit any key to finish ===============");
       Console.ReadKey();
    }

What am I doing wrong?

beccamc commented 3 years ago

@Ansjosen I think we should take this problem into a new issue. Please take a look at https://github.com/dotnet/machinelearning-modelbuilder/issues/1461 for my response.

Ansjosen commented 3 years ago

Thank you, #1461 it is

beccamc commented 3 years ago

For others in this issue, please check the following to troubleshoot:

  1. Are you actually trying to do value prediction on a string? If you chose "Value Prediction" (regression) as the scenario, the Label to predict must be of type Single. Value prediction should be used for "linear" predictions, like price, temperature or age. It shouldn't be used for classification problems (e.g. sort into group 1, 2 or 3).
  2. If you should be doing value prediction, and the label to predict is a number, it may be classified wrong by our system. Check it's Data Type on the Data Tab -> Advanced Data Options window. If you can share your data with me I'd be happy to look at it.
  3. We have an older bug in this space. If you are trying to solve a regression problem, and the data looks correct, updating to the latest version has fixed it for many customers.
beccamc commented 3 years ago

Hi everyone. Brand new model builder release out. Please update and let me know if you still see this issue. Thanks! https://marketplace.visualstudio.com/items?itemName=MLNET.07

beccamc commented 3 years ago

Closing this issue as it should be fixed in the newest build.

Troubleshooting...

  1. Please update to 16.6.0. https://marketplace.visualstudio.com/items?itemName=MLNET.07
  2. Ensure you are on the new UI (we had a previous bug where the version didn't match the installed bits). The scenario page should look like this: image.png
  3. Make sure you are predicting a number. Value prediction should be used for predicting a range of values (e.g. taxi fare of $10, $22.50 or $73.25). It shouldn't be used for bucketing or classifying (e.g. sort into group 1, 2, or 3). If you are sorting into groups you should use Text Classification.
  4. Ensure Label column is set to type "single". This can be changed in Data -> Advanced Data Options.

If none of the above work, please ping me or open a new issue. Thank you everyone for your engagement on this issue!

Sire commented 2 years ago

For me, this is an old bug relating to how different cultures parse decimal numbers, using comma instead of dot. Using an English culture on the thread is a workaround until this is resolved. I reported this years ago and I see it's still not fixed or even mentioned.

in F# use

System.Threading.Thread.CurrentThread.CurrentCulture  <- new CultureInfo("en-US")