dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
263 stars 56 forks source link

Remote training failes #2056

Closed zewditu closed 2 years ago

zewditu commented 2 years ago

Related issues:

zewditu commented 2 years ago

These issues are not reproducible in our latest main.

dubstar-04 commented 2 years ago

These issues are not reproducible in our latest main.

Are there instructions to build and install main? We have a Ai/ML hackathon today and it would be good if Azure training was available.

shahar74 commented 2 years ago

problem still happens - what need to be doe to use main? Thanks

luisquintanilla commented 2 years ago

@zewditu I was able to repro. If you can't repro in main let's try and push a release to see if that fixes it.

Error:

at AzureML.AutoMLRunMonitoringImages.<MonitorParentRunAsync>d__1.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/RemoteAutoML/AutoMLRunMonitoringImages.cs:line 137
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at AzureML.AutoMLRunnerImages.<RunAutoMLAsync>d__24.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/RemoteAutoML/AutoMLRunnerImages.cs:line 190
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AzureImageClassificationExperiment.<ExecuteAsync>d__14.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AzureImageClassificationExperiment.cs:line 63
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ML.ModelBuilder.AutoMLEngine.<StartTrainingAsync>d__21.MoveNext() in /_/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 160

Log File

Run ID: https://ml.azure.com/experiments/id/a752cc9c-e2a2-4e26-84f7-ed4167383269/runs/AutoML_7d717611-e9bd-465a-b2f4-53121fe9af12?wsid=/subscriptions/52462f3d-9226-4c1a-ae95-9a1f2b2c2e0f/resourceGroups/luquinta-MyResourceGroup/providers/Microsoft.MachineLearningServices/workspaces/luquinta-wkspc in the report

JakeRadMSFT commented 2 years ago

This is being resolve by the Azure team - I'll report back when the fix has been rolled out.

luisquintanilla commented 2 years ago

Hi all, just checking in. The Azure team is still working on a fix but unfortunately it's going to take a bit longer than expected. @dubstar-04 I know you're in hackathon mode, so apologies for the inconvenience. In the meantime, a potential solution you might want to consider is training with Azure Custom Vision, exporting the model to ONNX and consuming it with the ML.NET API.

Here is some guidance that might help.

https://mlnet-workshop.azurewebsites.net/category/ONNX

Let us know if you have questions.

Again, apologies everyone for the inconvenience. As Jake mentioned, we'll update this issue when the fix has been rolled out.

dubstar-04 commented 2 years ago

you might want to consider is training with Azure Custom Vision

@luisquintanilla I came across custom vision earlier today. It's working great! Just need to polish everything up for a demo! Thank you!

luisquintanilla commented 2 years ago

Great! Feel free to reach out if you have any questions.

luisquintanilla commented 2 years ago

Hi all,

Checking back in. The issue has been resolved, so you should be able to train object detection models again in Model Builder.

image

Apologies again for any inconvenience this may have caused. Closing this issue. Please feel free to open a new issue if you're still having problems.