dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
260 stars 52 forks source link

"Execution Timeout Expired" on Large SQL Dataset #2759

Open shainey0816 opened 11 months ago

shainey0816 commented 11 months ago

Screen capture 1 shows a generic timeout error when attempting to connect to SQL data source. This only occurs on very large datasets. For example, the timeout occurring on the desired data training set has 3936051 records in the table. I can switch to a table with a smaller number of records and everything loads just fine. I don't see a way to change the timeout time other than in the Advanced Settings when connecting to the database, but I changed the Initialization >> Command Timeout value to 3600 and it still errors within the same amount of time so that doesn't appear to affect this timeout.

Screen capture 2 shows that I can go into the "Advanced data options" and manually set my label in the UI, but then throws an error when saving. Screen capture 3 shows the selected label was saved regardless.

Screen capture 4 shows that I can select a different table from the database with smaller number of records and it loads just fine to proceed to the next steps.

To summarize the problem:

1) Using SQL database as source with very large dataset fails to load 2) "Data Preview, Processing data...." runs indefinitely 3) "Next step" button never enables due to these issues, so I can never finish the model unless I use a much smaller dataset which is not desirable for my application

Steps to reproduce the behavior:

1) Select value prediction scenario 2) Select local training environment 3) Input >> SQL Server 4) Data source >> Microsoft SQL Server 5) Authenticate server and select database 6) Select table with very large dataset (almost 4 million records in my case) and wait for the following error typically after ~30 seconds: "Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding." 7) Select a different table with smaller dataset to verify the connection is still active and working pointing to the large dataset as the issue

Expected behavior The expected behavior is that "Data Preview" doesn't timeout that soon, that the preview of the large dataset is loaded, that I can then select from the "Column to predict (Label)" dropdown list which column should be the label, and that the "Next step" button enables after selecting the label.

Screen capture 1

Capture1

Full stack trace:

at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.SqlCommand.InternalEndExecuteReader(IAsyncResult asyncResult, String endMethod, Boolean isInternal) at System.Data.SqlClient.SqlCommand.EndExecuteReaderInternal(IAsyncResult asyncResult) at System.Data.SqlClient.SqlCommand.EndExecuteReaderAsync(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task`1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.DataProviders.SQLDataProvider.d15.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.ToolWindows.DataTextViewModel.d146.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.ToolWindows.TextDataControl.<>c__DisplayClass4_0.<b__0>d.MoveNext()

Screen capture 2

Capture2

Full stack trace: at System.ThrowHelper.ThrowArgumentNullException(ExceptionArgument argument) at System.Collections.Generic.Dictionary2.FindEntry(TKey key) at System.Collections.Generic.Dictionary2.TryGetValue(TKey key, TValue& value) at Microsoft.ML.ModelBuilder.Utils.Utilities.GetTrainValidateDataSplitRatio(String ratio) at Microsoft.ML.ModelBuilder.ViewModels.AdvancedDataOptionsViewModel.d64.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.ToolWindows.AdvancedDataOptions.AdvancedDataCleaningDialog.<>cDisplayClass3_0.<b__0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.VisualStudio.Threading.JoinableTask.CompleteOnCurrentThread() at Microsoft.ML.ModelBuilder.ToolWindows.AdvancedDataOptions.AdvancedDataCleaningDialog.SaveButton_Click(Object sender, RoutedEventArgs e)

Screen capture 3

Capture3

Screen capture 4

Capture4

Additional context Consistently reproduceable following the steps above. It only occurs trying to use the tables with many 100k's of records.

LittleLittleCloud commented 11 months ago

@shainey0816 I wonder if command timeout is correctly set in model builder, would you mind sharing your connection string(withouth sensitive data) in modelbuilder with us? You can find it in .mbconfig

also @zewditu

shainey0816 commented 11 months ago

There doesn't appear to be too much in the connection string from model builder. See below.

Data Source=mydatabase.database.windows.net;Initial Catalog=sffc-sns;Persist Security Info=True;User ID=myUser

I've tried changing the Command Timeout property to a value greater than the default 30 doesn't produce different results. Still times out around ~30 seconds regardless of what value I use.

zewditu commented 10 months ago

@shainey0816 after you change your command timeout property to some other value greater than 30 seconds and save your Model builder file could you share your connection string from model builder? It should appear in your connection string.

zewditu commented 10 months ago

From Model Builder Side we might need to improve the performance of some of our methods like GetSqlTableRowCountAsync

nikhilguda commented 1 month ago

I'm facing the same issue. I checked the database query logs and I can see that the query was going on when Git Actions said it timed out. A new session is created to execute the error block. Once the new session is completed, the old session is killed too.