Closed bheinrich17 closed 6 months ago
Could you upgrade model builder to 17.18.2 and try again? We recently push a fix to dispose torch model smartly which might helpful in mitigate this issue.
After updating the ML.Net Model Builder 2022 Extension to 17.18.2.2415501, the text clasification model training works fine now. Thank You!
System Information (please complete the following information):
Describe the bug
To Reproduce Steps to reproduce the behavior:
Expected behavior Training should run and complete
Screenshots
Additional context I have tried severel Datasets all with the outcome I also tried the NPL Sentence similarity scenario with the same outcome.
Log:b__92_0>d.MoveNext() (Microsoft.ML.ModelBuilder.Utils.Logger.Debug)
`
2024-03-17 17:35:55.7357 DEBUG Start AutoMLService pid: 17156 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2024-03-17 17:35:56.0884 INFO start text classification (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.0884 DEBUG env:path: C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files\Microsoft VS Code\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\dotnet\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\Microsoft SQL Server\150\Tools\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\Program Files\Git\cmd;C:\Program Files\Docker\Docker\resources\bin;C:\Users\MacTee\AppData\Local\Programs\Python\Python310\Scripts\;C:\Users\MacTee\AppData\Local\Programs\Python\Python310\;C:\Users\MacTee\AppData\Local\Microsoft\WindowsApps;C:\Users\MacTee\.dotnet\tools; (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2024-03-17 17:35:56.0884 DEBUG C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\obj (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2024-03-17 17:35:56.0884 INFO restore "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\ENTERPRISE\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.cpu.csproj" --configfile "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\ENTERPRISE\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\NuGet.config" -r win-x64 /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseIntermediateOutputPath="C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\obj" (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.0884 INFO publish "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\ENTERPRISE\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.cpu.csproj" -r win-x64 -c Release --no-self-contained -o C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3 --no-restore /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseOutputPath="C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\bin\\" /p:BaseIntermediateOutputPath="C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\obj\\" (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.0884 INFO start installing runtime in C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3 (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.9421 INFO Determining projects to restore... (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:57.1392 INFO All projects are up-to-date for restore. (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:57.3949 INFO MSBuild version 17.8.5+b5265ef37 for .NET (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.2022 INFO torchsharp.cpu -> C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\bin\Release\netstandard2.0\win-x64\torchsharp.cpu.dll (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.2259 INFO torchsharp.cpu -> C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\ (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.2663 INFO install runtime successfully (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.8805 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; TrainModel, Kind=Trace] Channel started (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3022 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel started (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3208 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel finished. Elapsed 00:00:00.0183094. (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3218 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel disposed (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3904 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; TrainModel, Kind=Trace] Starting epoch 0 (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:36:03.7612 DEBUG The JSON-RPC connection with the remote party was lost before the request could complete. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at StreamJsonRpc.JsonRpc.<InvokeCoreAsync>d__162.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at StreamJsonRpc.JsonRpc.<InvokeCoreAsync>d__151
1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.ViewModels.TrainViewModel.<