dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Still problem with "Input string was not in a correct format" #1493

Closed Darlanio closed 3 years ago

Darlanio commented 3 years ago

Referring to closed issue: https://github.com/dotnet/machinelearning-modelbuilder/issues/845 Trying to follow this tutorial with a slight modification (my own data with 5 columns). With smaller datasets (<10000 rows), like A,B,Result where A and B are random numbers 0-99 and Result is the sum or product, it is possible to train for a minute on CPU without errors, but at ten minutes or with larger datasets the error occurs.

System Information (please complete the following information):

Describe the bug

To Reproduce Steps to reproduce the behavior:

  1. Create a new .NET Core 3.1 project.
  2. Right click on the project and add machine learning.
  3. Select Value prediction
  4. Select Local (For this, I train on CPU, no GFX involved)
  5. Select the file and set prediction column as prediction
  6. Train for 600 seconds.
  7. Error should appear.

Expected behavior I would have expected the model to be added to the project without an error occurring. The file contains rounded floats approximating p = a x b x c x d. I tried with simpler datasets and when training can be limited to less than 60 seconds, it works.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Debug Log: 2021-06-03 20:27:09.9301 DEBUG Set log file path to C:\Users\darla\AppData\Local\Temp\MLVSTools\logs\1c5e2112-e010-4acb-82a1-373f09101864.txt (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2021-06-03 21:59:48.7752 DEBUG C:\Users\darla\source\repos\NETCoreCreateDataSet\NETCoreCreateDataSet\bin\Debug\netcoreapp3.1\dataset1.tsv (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2021-06-04 00:12:06.5689 DEBUG C:\Users\darla\source\repos\NETCoreCreateDataSet\NETCoreCreateDataSet\bin\Debug\netcoreapp3.1\dataset1.tsv (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2021-06-04 00:13:35.5941 DEBUG Disposing TrainSession (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2021-06-04 00:13:38.1135 WARN GPU Service not found. Falling back to CPU AutoML Service. (Microsoft.ML.ModelBuilder.Utils.Logger.Warn) 2021-06-04 00:13:40.0998 INFO | Trainer RSquared Absolute-loss Squared-loss RMS-loss Duration #Iteration | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:13:43.2422 INFO |1 SdcaRegression 0,8842 0,15 0,04 0,21 2,5 1 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:13:46.1628 INFO |2 LightGbmRegression 0,9964 0,03 0,00 0,04 2,9 2 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:13:51.6378 INFO |3 FastTreeRegression 0,9960 0,03 0,00 0,04 5,5 3 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:13:58.9658 INFO |4 FastTreeTweedieRegression 0,9955 0,03 0,00 0,04 7,3 4 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:05.2862 INFO |5 FastForestRegression 0,7915 0,21 0,08 0,28 6,3 5 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:07.3135 INFO |6 LbfgsPoissonRegression 0,9808 0,06 0,01 0,09 2,0 6 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:09.1019 INFO |7 OnlineGradientDescentRegression 0,8808 0,15 0,05 0,21 1,8 7 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:10.7940 INFO |8 OlsRegression 0,8843 0,15 0,04 0,21 1,7 8 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:13.5810 INFO |9 LightGbmRegression 0,9805 0,06 0,01 0,09 2,8 9 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:33.8414 INFO |10 FastTreeRegression 0,9993 0,01 0,00 0,02 20,3 10 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:36.6655 INFO |11 FastTreeTweedieRegression 0,2031 0,42 0,30 0,55 2,8 11 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:38.6425 INFO |12 LightGbmRegression 0,9422 0,11 0,02 0,15 2,0 12 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:14:55.8813 INFO |13 FastTreeRegression 0,9992 0,01 0,00 0,02 17,2 13 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:05.5139 INFO |14 FastTreeTweedieRegression 0,9390 0,10 0,02 0,15 9,6 14 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:09.0996 INFO |15 LightGbmRegression 0,9982 0,02 0,00 0,03 3,6 15 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:18.7497 INFO |16 FastTreeRegression 0,9947 0,03 0,00 0,04 9,6 16 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:21.2727 INFO |17 FastTreeTweedieRegression 0,5571 0,30 0,17 0,41 2,5 17 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:23.9141 INFO |18 LightGbmRegression 0,9831 0,06 0,01 0,08 2,6 18 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:26.1774 INFO |19 FastTreeRegression -2,1707 0,93 1,20 1,10 2,3 19 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:47.5387 INFO |20 FastTreeTweedieRegression 0,9981 0,02 0,00 0,03 21,4 20 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:49.6557 INFO |21 LightGbmRegression 0,8827 0,16 0,04 0,21 2,1 21 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:51.6150 INFO |22 FastTreeRegression -1,5322 0,79 0,96 0,98 2,0 22 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:56.8352 INFO |23 FastTreeTweedieRegression 0,6924 0,26 0,12 0,34 5,2 23 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:15:58.7807 INFO |24 LightGbmRegression 0,7291 0,24 0,10 0,32 1,9 24 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:16:01.1459 INFO |25 FastTreeRegression 0,9614 0,09 0,01 0,12 2,4 25 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:16:04.4861 INFO |26 FastTreeTweedieRegression 0,4639 0,33 0,20 0,45 3,3 26 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:16:07.2304 INFO |27 LightGbmRegression 0,9782 0,07 0,01 0,09 2,7 27 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:16:34.3087 INFO |28 FastTreeRegression 0,9801 0,06 0,01 0,09 27,1 28 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:16:37.6194 INFO |29 FastTreeTweedieRegression 0,0800 0,45 0,35 0,59 3,3 29 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:16:40.1970 INFO |30 LightGbmRegression 0,8696 0,15 0,05 0,22 2,6 30 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:16:53.2113 INFO |31 FastTreeRegression 0,9935 0,03 0,00 0,05 13,0 31 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:16.3573 INFO |32 FastTreeTweedieRegression 0,9758 0,07 0,01 0,10 23,1 32 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:20.2456 INFO |33 LightGbmRegression 0,8822 0,15 0,04 0,21 3,9 33 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:23.0729 INFO |34 FastTreeRegression -0,0172 0,51 0,39 0,62 2,8 34 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:26.1219 INFO |35 FastTreeTweedieRegression 0,0263 0,47 0,37 0,61 3,0 35 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:28.4487 INFO |36 LightGbmRegression 0,9927 0,04 0,00 0,05 2,3 36 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:30.4403 INFO |37 FastTreeRegression 0,8027 0,20 0,07 0,27 2,0 37 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:32.5977 INFO |38 FastTreeTweedieRegression 0,8442 0,17 0,06 0,24 2,2 38 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:35.4974 INFO |39 LightGbmRegression 0,9787 0,06 0,01 0,09 2,9 39 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:43.2962 INFO |40 FastTreeRegression 0,8814 0,15 0,04 0,21 7,8 40 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:54.1076 INFO |41 FastTreeTweedieRegression 0,9773 0,06 0,01 0,09 10,8 41 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:17:56.4898 INFO |42 LightGbmRegression 0,7673 0,23 0,09 0,30 2,4 42 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:18:10.2398 INFO |43 FastTreeRegression 0,9385 0,11 0,02 0,15 13,7 43 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:18:17.8742 INFO |44 FastTreeTweedieRegression 0,9962 0,03 0,00 0,04 7,6 44 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:18:22.5084 INFO |45 LightGbmRegression 0,9990 0,01 0,00 0,02 4,6 45 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:18:25.2342 INFO |46 FastTreeRegression 0,9739 0,07 0,01 0,10 2,7 46 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:18:46.9190 INFO |47 FastTreeTweedieRegression 0,7017 0,25 0,11 0,34 21,7 47 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:18:50.1409 INFO |48 LightGbmRegression 0,9974 0,02 0,00 0,03 3,2 48 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:18:58.3642 INFO |49 FastTreeRegression 0,9988 0,02 0,00 0,02 8,2 49 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:19:08.1574 INFO |50 FastTreeTweedieRegression 0,7427 0,22 0,10 0,31 9,8 50 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:19:12.2166 INFO |51 LightGbmRegression 0,9972 0,02 0,00 0,03 4,1 51 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:19:21.8491 INFO |52 FastTreeRegression 0,9663 0,07 0,01 0,11 9,6 52 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:19:26.1101 INFO |53 FastTreeTweedieRegression 0,4846 0,32 0,20 0,44 4,3 53 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:19:28.2084 INFO |54 LightGbmRegression 0,5135 0,33 0,18 0,43 2,1 54 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:19:59.3728 INFO |55 FastTreeRegression 0,9985 0,02 0,00 0,02 31,2 55 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:20:02.2304 INFO |56 FastTreeTweedieRegression 0,8848 0,14 0,04 0,21 2,9 56 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:20:04.6358 INFO |57 LightGbmRegression 0,9496 0,10 0,02 0,14 2,4 57 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:20:23.4426 INFO |58 FastTreeRegression 0,9537 0,09 0,02 0,13 18,8 58 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:20:26.8705 INFO |59 FastTreeTweedieRegression 0,2804 0,39 0,27 0,52 3,4 59 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:20:29.9091 INFO |60 LightGbmRegression 0,9978 0,02 0,00 0,03 3,0 60 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:21:07.6770 INFO |61 FastTreeRegression 0,9990 0,01 0,00 0,02 37,8 61 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:21:41.7417 INFO |62 FastTreeTweedieRegression 0,7642 0,21 0,09 0,30 34,1 62 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:21:43.8605 INFO |63 LightGbmRegression 0,8582 0,18 0,05 0,23 2,1 63 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:21:51.7671 INFO |64 FastTreeRegression 0,8806 0,15 0,05 0,21 7,9 64 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:21:54.6391 INFO |65 FastTreeTweedieRegression 0,0938 0,45 0,34 0,59 2,9 65 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:21:57.5520 INFO |66 LightGbmRegression 0,9889 0,05 0,00 0,06 2,9 66 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:22:22.9859 INFO |67 FastTreeRegression 0,9991 0,01 0,00 0,02 25,4 67 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:22:26.5649 INFO |68 FastTreeTweedieRegression 0,1032 0,44 0,34 0,58 3,6 68 | (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2021-06-04 00:22:26.5864 DEBUG Input string was not in a correct format. at System.Number.ParseSingle(String value, NumberStyles options, NumberFormatInfo numfmt) at Microsoft.ML.AutoML.SweeperProbabilityUtils.ParameterSetAsFloatArray(IValueGenerator[] sweepParams, ParameterSet ps, Boolean expandCategoricals) at Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable1 previousRuns) at Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable1 previousRuns) at Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable1 history, Boolean isMaximizingMetric) at Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IEnumerable1 trainerWhitelist) at Microsoft.ML.AutoML.Experiment2.Execute() at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.<>cDisplayClass21_0.b_5() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 81 at System.Threading.Tasks.Task1.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment3.d21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 108 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.AutoMLEngine.d_30.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 147 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2021-06-04 11:10:18.5415 DEBUG Open Log FileC:\Users\darla\AppData\Local\Temp\MLVSTools\logs\1c5e2112-e010-4acb-82a1-373f09101864.txt (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)

LittleLittleCloud commented 3 years ago

@beccamc Looks like a bug in SMAC sweeper in old AutoML.Net, which should no longer exist in our main branch. Maybe we can launch another preview release to make the fix available?

beccamc commented 3 years ago

Sorry you ran into this issue @Darlanio. Model Builder version number is available in Extensions -> Manage Extension -> Installed. If you aren't on 16.3.0.2056001 you can try to update here.

We are validating a new release this week, which as Xiaoyun mentioned above has removed this code. I'll update this issue when that is released (will also be available from the marketplace link above).

beccamc commented 3 years ago

@Darlanio we just released the new version. Can you update and try again? https://marketplace.visualstudio.com/items?itemName=MLNET.07

Darlanio commented 3 years ago

Thanks! Will do.

Darlanio commented 3 years ago

After updating Modelbuilder I am using version 16.6.0.2130907. I have only tested training a handful of networks so far, but not had any trouble. I will let you know if I get the error again, but for now this seems to be a bug that is solved.

Many thanks for the quick replies and the update!

beccamc commented 3 years ago

I'm going to close this issue. If you see the bug again feel free to reopen, and please @ mention me. Thanks for reporting!

idenchik1 commented 3 years ago

@beccamc same issue, ML.NET 16.6.1.213190 image It with ~150k rows and 60 sec, has same result with 50k rows and 10 sec

beccamc commented 3 years ago

@idenchik1 Can you share a sample row of your dataset?

idenchik1 commented 3 years ago

https://hastebin.com/raw/ayunozupos I used this dataset https://www.kaggle.com/fizzbuzz/cleaned-toxic-comments?select=train_preprocessed.csv or something like this 0e2d592688eed0f3c6afd87b1b39477df11dfded50610a5b4b277c2b2f7414ca;False a074cd0285dc1c121e2ea2c0e70168b86f9b83293f92d0bd1c0e57a377e1d46a;False 8c7567237067316295fe18748acc53bb81a990f2193091235889191ca454b570;True