MicrosoftLearning / mslearn-aml-cli

Lab files for Learn modules on using the Azure Machine Learning CLI (v2)
https://microsoftlearning.github.io/mslearn-aml-cli/
MIT License
8 stars 19 forks source link

Submitted job stuck in preparing status #12

Open kel-varnsen opened 6 months ago

kel-varnsen commented 6 months ago

Module: mslearn-aml-cli

Lab/Demo: 2 Run a basic Python training job

Task: basic-job

Step: 00

Description of issue after submitting a job using azure cli, I can see the job in Azure ML Studio, but it seems to be stuck in the Preparing Status. I tried submitting it to different compute instances and clusters, same thing. Repro steps:

  1. Followed the exact steps from the tutorial
  2. https://microsoftlearning.github.io/mslearn-aml-cli/Instructions/Labs/02-run-python-job.html
kel-varnsen commented 6 months ago

After further investigation, it got stuck building the conda environment. I assume this could have to do with version incompatibilities (python=3.7) in the basic-env-cpu.yml from Allfiles/Labs/01/conda-envs/basic-env-cpu.yml.

TheOriginalJC commented 6 months ago

Just to say, this is the same headache I've experienced.

I think I got a little further by attempting to build the environment using a docker context in the CLI based on the curated environments instead, but sadly I still ran into issues later on.

JohnKHancock commented 4 months ago

I'm having the exact same issue as well. I too followed the tutorials.

benoitfrisque commented 1 month ago

There is a problem when creating the environment in step "Create an environment". I got a problem in the prepare_image experiment. It gets stuck after 2024-05-14T12:52:27: Collecting package metadata (repodata.json): ...working... done

2024-05-14T14:20:07.259727Z ERROR appinsights::channel::state: commands channel closed
image

benoitfrisque commented 1 month ago

A workaround is to replace the environment with "azureml://registries/azureml/environments/sklearn-1.1/versions/32" in the yml files.