dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Using SQL Server View for model training is causing "Input string was not in a correct format." error #1143

Closed dliedke closed 3 years ago

dliedke commented 3 years ago

System Information (please complete the following information):

Describe the bug

To Reproduce Steps to reproduce the behavior: 1) Scenario Text Classification 2) Data SQL Server 3) Select a view in Table dropdown

  1. See error:

2020-12-05 09:06:57.7597 TRACE Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.AutoMLServiceFactory+AutoMLServiceProxy.Dispose) 2020-12-05 09:07:07.6815 DEBUG System.FormatException: Input string was not in a correct format. at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info) at System.Convert.ToInt32(String value) at Microsoft.ML.ModelBuilder.DataSources.SQLDataSource.GetTableSize(String tableName) at Microsoft.ML.ModelBuilder.ToolWindows.DataTabDataContext.d160.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.ToolWindows.DataTabDataContext.<b161_0>d.MoveNext() (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)

Expected behavior Load view data correctly and display in the UI.

Screenshots If applicable, add screenshots to help explain your problem. image

Additional context It is was working in Model Builder Version 16.1.2027905 and previous versions without issues.

dmccarty-incomm commented 3 years ago

I have essentially the same issue. Visual Studio Version 16.8.3 (don't know how to determine model builder version) Scenario: Value prediction (regression) Environment: Local CPU Data: ...Input: SQL Server ...Table: I select a view that returns 4 columns, all CAST as REAL in the view definition on SQL Server. After about a second I get the same message. My log file shows:

2021-01-11 17:08:05.0007 TRACE Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.AutoMLServiceFactory+AutoMLServiceProxy.Dispose) 2021-01-11 17:08:11.2606 TRACE Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.AutoMLServiceFactory+AutoMLServiceProxy.Dispose) 2021-01-11 17:08:12.3988 DEBUG System.FormatException: Input string was not in a correct format. at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info) at System.Convert.ToInt32(String value) at Microsoft.ML.ModelBuilder.DataSources.SQLDataSource.GetTableSize(String tableName) at Microsoft.ML.ModelBuilder.ToolWindows.DataTabDataContext.d160.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.ToolWindows.DataTabDataContext.<b161_0>d.MoveNext() (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)

Note that if I select a DB table, it seems to be OK (at least to this point).

Sire commented 3 years ago

Ping @beccamc

Can confirm this is number formatting bug related to regions that uses comma as a decimal separator.

You mentioned region bugs was fixed, but this happens in latest v16.3.0.2056001.

To reproduce bug and the workaround:

2021-02-01 14:24:17.2759 TRACE Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.AutoMLServiceFactory+AutoMLServiceProxy.Dispose)
2021-02-01 14:24:20.5782 DEBUG System.FormatException: Input string was not in a correct format.
   at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
   at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info)
   at System.Convert.ToInt32(String value)
   at Microsoft.ML.ModelBuilder.DataSources.SQLDataSource.GetTableSize(String tableName)
   at Microsoft.ML.ModelBuilder.ToolWindows.DataTabDataContext.<GetFileDataPropertiesFromSQLAsync>d__160.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.ToolWindows.DataTabDataContext.<<TriggerGetFileDataPropertiesFromSql>b__161_0>d.MoveNext() (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)
2021-02-01 14:24:28.4323 TRACE Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.AutoMLServiceFactory+AutoMLServiceProxy.Dispose)
2021-02-01 14:24:28.7864 DEBUG Error while processing your data: Input array is longer than the number of columns in this table. 

Most likely your data contains multiple instances of your delimiter. Check your dataset for text fields containing ",". To fix, you can enclose these fields with quotes.
   at System.Data.DataTable.NewRecordFromArray(Object[] value)
   at System.Data.DataRowCollection.Add(Object[] values)
   at Microsoft.ML.ModelBuilder.ToolWindows.DataTabDataContext.<CreateDataTableAsync>d__174.MoveNext() (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)
2021-02-01 14:25:24.1303 INFO |     Trainer                             RSquared Absolute-loss Squared-loss RMS-loss  Duration #Iteration      | (Microsoft.ML.ModelBuilder.Utils.Logger.Info)
2021-02-01 14:25:47.4654 TRACE Disposing AutoMLService Client (Microsoft.ML.ModelBuilder.AutoMLServiceFactory+AutoMLServiceProxy.Dispose)
2021-02-01 14:25:47.4654 DEBUG Specified cast is not valid.
   at System.Data.SqlClient.SqlBuffer.get_Single()
   at Microsoft.ML.Data.DatabaseLoader.Cursor.<>c__DisplayClass37_0.<CreateSingleGetterDelegate>b__0(Single& value)
   at Microsoft.ML.AutoML.DatasetDimensionsUtil.HasMissingNumericSingleValue(IDataView data, Column column)
   at Microsoft.ML.AutoML.DatasetDimensionsApi.CalcColumnDimensions(MLContext context, IDataView data, Column[] purposes)
   at Microsoft.ML.AutoML.DatasetColumnInfoUtil.GetDatasetColumnInfo(MLContext context, IDataView data, ColumnInformation columnInfo)
   at Microsoft.ML.AutoML.ExperimentBase`2.ExecuteTrainValidate(IDataView trainData, ColumnInformation columnInfo, IDataView validationData, IEstimator`1 preFeaturizer, IProgress`1 progressHandler)
   at Microsoft.ML.AutoML.ExperimentBase`2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer, IProgress`1 progressHandler)
   at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment`3.<>c__DisplayClass21_0.<ExecuteAsync>b__5() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 80
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AutoMLExperiment`3.<ExecuteAsync>d__21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AutoMLExperiment.cs:line 107
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLEngine.<StartTrainingAsync>d__26.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 150 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)
beccamc commented 3 years ago

@Sire Thank you for the detailed repro! I will take a look

dliedke commented 3 years ago

Just an idea for the fix, the code in method Microsoft.ML.ModelBuilder.DataSources.SQLDataSource() is failing for the view because "data" column returns null as the view does not have a size in the database.

public long GetTableSize(string tableName)
{
    string cmdText = $"EXEC sp_spaceused {tableName}";
    DataSet dataSet = new DataSet();
    using (SqlConnection sqlConnection = new SqlConnection(DataProtection.DecryptString(EncryptedConnectionString)))
    {
        sqlConnection.Open();
        using (SqlCommand sqlCommand = new SqlCommand(cmdText, sqlConnection))
        {
            sqlCommand.ExecuteNonQuery();
            new SqlDataAdapter(sqlCommand).Fill(dataSet);
        }
        sqlConnection.Close();
    }
    return Convert.ToInt32(dataSet.Tables[0].Rows[0]["data"].ToString().Replace("KB", string.Empty).Trim()) * 1000;
}

One crazy idea that works is copying the data to real temporary table, getting the size and dropping it:

SELECT *
INTO tablename
FROM [VIEW]

EXEC sp_spaceused 'tablename'

DROP TABLE tablename

Probably there are better ways to measure the query result size, but this is one that I found that works.

Thanks! Daniel

beccamc commented 3 years ago

@dliedke Thanks for the detailed fix! 😄 A "fix" for this will be in our March release (I noticed the problem and started catching the exception and returning a default value... not as helpful as your fix but I'll make a note to look at improving it!) I'll update this issue when the fix ships and you can tell me if it works.

As a note, I was only able to reproduce this with an Azure SQL DB. Are you using Azure or something local?

dliedke commented 3 years ago

Hi Becca! Welcome! I used a local SQL database. Very nice to know the fix will be released soon. Thank you!

beccamc commented 3 years ago

Hi everyone, we've shipped a new version of Model Builder. Can you check to see if this is fixed? https://marketplace.visualstudio.com/items?itemName=MLNET.07

dliedke commented 3 years ago

Hi team, I rested here and worked fine. Thanks a lot for the fix!