Closed briand-abl closed 2 years ago
My issue seems to be an invalid value for the "imageBuildCompute" property on the workspace. It was set to one of the compute instances. I changed it to be the compute cluster the CICD pipeline is targeting and the Train model task seems to be running now (at least, it's not failing immediately!). JIC anybody else has this problem, use the "Diagnose" button on your cluster and you should see any errors in setup. This is how I found my issue.
I've followed the instructions in the readme to set up the repo, created the service connection as directed, and created an Azure DevOps pipeline based on the diabetes-train-and-deploy.yml file. The workspace the pipeline points to is an existing resource that was created prior to finding the pipelines-azureml repo. When I run the pipeline it always fails on the Train Model step with the following error:
I'm able to dig in further to the error in ML Studio and it shows the user calling is the service connection I set up for the pipeline. On the off chance that it might be a permissions issue, I added that user as a contributor to the workspace but I see the same error. I've also tried the powershell commands from the "Run CLI scripts..." section at the bottom of the README.md file and I get the same message running under my Azure account which has the Owner role on the ML Workspace.
The pipeline was able to create the compute cluster, but it seems that it doesn't have access to the cluster after it's created? Another possibility is that our workspace has something locked down that is preventing this pipeline from working properly. Any help is greatly appreciated. Thank you!