dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Training throws Exception: JSON-RPC connection with the remote party was lost ... #2882

Closed bheinrich17 closed 6 months ago

bheinrich17 commented 6 months ago

System Information (please complete the following information):

Describe the bug

To Reproduce Steps to reproduce the behavior:

  1. Scenario = NLP Textclassification
  2. Environment = Local (CPU)
  3. Data = From https://archive.ics.uci.edu/dataset/331/sentiment+labelled+sentences i used the yelp_labelled.txt Col to Predict = col1 Text Col = col0
  4. Click on Training -> after a few seconds -> Exception. (see screenshot)

Expected behavior Training should run and complete

Screenshots grafik

Additional context I have tried severel Datasets all with the outcome I also tried the NPL Sentence similarity scenario with the same outcome.

Log: 2024-03-17 17:35:55.7357 DEBUG Start AutoMLService pid: 17156 (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2024-03-17 17:35:56.0884 INFO start text classification (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.0884 DEBUG env:path: C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files\Microsoft VS Code\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\dotnet\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\Microsoft SQL Server\150\Tools\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;C:\Program Files\Git\cmd;C:\Program Files\Docker\Docker\resources\bin;C:\Users\MacTee\AppData\Local\Programs\Python\Python310\Scripts\;C:\Users\MacTee\AppData\Local\Programs\Python\Python310\;C:\Users\MacTee\AppData\Local\Microsoft\WindowsApps;C:\Users\MacTee\.dotnet\tools; (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2024-03-17 17:35:56.0884 DEBUG C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\obj (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) 2024-03-17 17:35:56.0884 INFO restore "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\ENTERPRISE\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.cpu.csproj" --configfile "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\ENTERPRISE\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\NuGet.config" -r win-x64 /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseIntermediateOutputPath="C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\obj" (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.0884 INFO publish "C:\PROGRAM FILES\MICROSOFT VISUAL STUDIO\2022\ENTERPRISE\COMMON7\IDE\COMMONEXTENSIONS\MICROSOFT\MODELBUILDER\AUTOMLSERVICE\RuntimeManager\torchsharp.cpu.csproj" -r win-x64 -c Release --no-self-contained -o C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3 --no-restore /p:UsingToolXliff=false /p:TorchSharpVersion=0.98.3 /p:TorchSharpCudaRuntimeVersion=1.11.0.1 /p:TensorflowRuntimeVersion=2.3.1 /p:BaseOutputPath="C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\bin\\" /p:BaseIntermediateOutputPath="C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\obj\\" (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.0884 INFO start installing runtime in C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3 (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:56.9421 INFO Determining projects to restore... (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:57.1392 INFO All projects are up-to-date for restore. (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:57.3949 INFO MSBuild version 17.8.5+b5265ef37 for .NET (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.2022 INFO torchsharp.cpu -> C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\bin\Release\netstandard2.0\win-x64\torchsharp.cpu.dll (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.2259 INFO torchsharp.cpu -> C:\Users\MacTee\AppData\Local\Temp\ModelBuilder\torchsharp-cpu-0.98.3\ (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.2663 INFO install runtime successfully (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:58.8805 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; TrainModel, Kind=Trace] Channel started (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3022 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel started (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3208 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel finished. Elapsed 00:00:00.0183094. (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3218 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; Ensuring model file is present., Kind=Trace] Channel disposed (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:35:59.3904 INFO [Source=AutoMLExperiment-ChildContext, Kind=Trace] [Source=NasBertTrainer; TrainModel, Kind=Trace] Starting epoch 0 (Microsoft.ML.ModelBuilder.Utils.Logger.Info) 2024-03-17 17:36:03.7612 DEBUG The JSON-RPC connection with the remote party was lost before the request could complete. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at StreamJsonRpc.JsonRpc.<InvokeCoreAsync>d__162.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at StreamJsonRpc.JsonRpc.<InvokeCoreAsync>d__1511.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.ML.ModelBuilder.ViewModels.TrainViewModel.<b__92_0>d.MoveNext() (Microsoft.ML.ModelBuilder.Utils.Logger.Debug) `

LittleLittleCloud commented 6 months ago

Could you upgrade model builder to 17.18.2 and try again? We recently push a fix to dispose torch model smartly which might helpful in mitigate this issue.

bheinrich17 commented 6 months ago

After updating the ML.Net Model Builder 2022 Extension to 17.18.2.2415501, the text clasification model training works fine now. Thank You!