Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4k stars 2.49k forks source link

force_rerun attribute in pipeline YAML not working #1872

Closed kenanEkici closed 1 year ago

kenanEkici commented 1 year ago

Hi,

I have defined a pipeline through YAML ($schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json) and I am using the CLI to run and schedule the pipeline job in Azure Machine Learning studio. With consecutive runs, the pipeline does not rerun as the YAML definition and the input does not change. This behavior is expected.

However, I want to disable this such that the pipeline is forced to rerun. To achieve this, I have set the force_rerun attribute to True in my pipeline YAML definition.

yaml_input

Unfortunately, the pipeline keeps reusing the output of previous runs and does not force rerunning the pipeline again. It seems as though the attribute is simply not being considered. In the Designer on Azure Machine Learning studio, you can clearly see this. The option to "Regenerate output" is also greyed out, meaning it cannot changed. issue

brynn-code commented 1 year ago

Can't repro this issue... @kenanEkici could you please try submitting a pipeline job manually with force_rerun=True, to see if it works? And if it works well, you could schedule the existing run submitted to see if the issue can be workaround.

brynn-code commented 1 year ago

And could you please let me know your sdk version?

kenanEkici commented 1 year ago

The attribute works when submitting the pipeline manually. When you schedule the same pipeline, the attribute gets ignored and each run (except the first one) gets cached. I'm working with version 2.12.1

brynn-code commented 1 year ago

Okay.. Is it convenient for you to share the schedule yaml and the pipeline job definition yaml? Need more info as I'm not able to repro it.

kenanEkici commented 1 year ago

Hi, I'm not able to share the artifacts. Instead, I've created a basic set of components and pipelines to see if I could reproduce my issue.

_testscript.py

import argparse
import datetime
parser = argparse.ArgumentParser()
parser.add_argument("--test_input", type=str)

args = parser.parse_args()
test_input = args.test_input
print(test_input)
print(datetime.datetime.now())

component.yml

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command

name: test_component
display_name: test_component

inputs:
  test_input:
    type: string

code: ./

environment: azureml:nec-env-dev:4

command: >-
  python test_script.py
  --test_input ${{inputs.test_input}}

pipeline.yml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

display_name: test_pipeline
description: test_pipeline
experiment_name: test_experiment

inputs:
  test_input: "Helloworld"

settings:
  default_datestore: azureml:workspaceblobstore
  default_compute: azureml:cpucls-d-aml-weu-dac
  force_rerun: true

jobs:
  test_component:
    type: command
    component: ./component.yml
    inputs:
      test_input: ${{parent.inputs.test_input}}

_testschedule.yml

$schema: https://azuremlschemas.azureedge.net/latest/schedule.schema.json
name: test_schedule
display_name: test_schedule
description: test_schedule

trigger:
  type: cron
  expression: "*/20 * * * *"
  time_zone: "UTC"

create_job:
  job: ./pipeline.yml

First I manually pushed the job.

az ml job create -f .\pipeline.yml

Wait until job is finished on AML

az ml job create -f .\pipeline.yml

Result: the output did not get cached which is expected behavior.

Then I schedule this pipeline to run every 20 minutes using _testschedule.yml.

az ml schedule create -f .\test_schedule.yml

The scheduled pipeline run also reran fully and did not cache the output of the previous run. This means that the force_rerun attribute was taken into account and I was not able to reproduce my own issue.

I will be gradually adding more attributes and source code from the original artifacts where I encountered the issue. I hope to identify what is causing the issue.

kenanEkici commented 1 year ago

I'm closing this issue as I could not reproduce the bug.

pezosanta commented 10 months ago

@kenanEkici @brynn-code Hi, I would like to reopen this issue as I am facing the same problem.

sdk version: azure-ai-ml==1.10.0

I have a pipeline.yaml pipeline definition. When I load it with azure.ai.ml.load_component and run the following codes:

from azure.ai.ml import Input, load_component

pipeline = load_component(source=pipeline_definition_path)

# Defining pipeline inputs
pipeline_job = pipeline(
    data_organizer_input_data=Input(path="azureml://datastores/test_datastore/paths/", mode="ro_mount"),
)

and print out the pipeline and the pipeline_job variables, you can see, that although, the settings section from pipeline.yaml can be found in pipeline variable, it is completele removed from the pipeline_job variable.

pipeline:

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
name: pipeline_test
display_name: Pipeline Test
description: Pipeline test
type: pipeline
inputs:
  data_organizer_input_data:
    type: uri_folder
    mode: ro_mount
jobs:
  data_organizer_job:
    type: command
    inputs:
      input_data:
        path: ${{parent.inputs.data_organizer_input_data}}
    component:
      $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
      name: azureml_anonymous
      version: '1'
      display_name: Data Organizer
      description: Organizing the input Data Asset in a way that it can be fed to
        the model.
      type: command
      inputs:
        input_data:
          type: uri_folder
          mode: ro_mount
      outputs:
        output_data:
          type: uri_folder
          mode: rw_mount
      command: python data_organizer.py --input_data ${{inputs.input_data}} --output_data
        ${{outputs.output_data}}
      environment: azureml:my_env@latest
      code: <path_to_code>
      is_deterministic: true
    compute: azureml:my_compute_cluster
experiment_name: pipeline_test_experiment
settings:
  default_compute: azureml:my_OTHER_compute_cluster
  force_rerun: true
  continue_on_step_failure: false

pipeline_job:

type: pipeline
inputs:
  data_organizer_input_data:
    mode: ro_mount
    type: uri_folder
    path: azureml://datastores/test_datastore/paths/
component:
  $schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
  name: pipeline_test
  display_name: Pipeline Test
  description: Pipeline test
  type: pipeline
  inputs:
    data_organizer_input_data:
      type: uri_folder
      mode: ro_mount
  jobs:
    data_organizer_job:
      type: command
      inputs:
        input_data:
          path: ${{parent.inputs.data_organizer_input_data}}
      component:
        $schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
        name: azureml_anonymous
        version: '1'
        display_name: Data Organizer
        description: Organizing the input Data Asset in a way that it can be fed to
          the model.
        type: command
        inputs:
          input_data:
            type: uri_folder
            mode: ro_mount
        outputs:
          output_data:
            type: uri_folder
            mode: rw_mount
        command: python data_organizer.py --input_data ${{inputs.input_data}} --output_data
          ${{outputs.output_data}}
        environment: azureml:my_env@latest
        code: <path_to_code>
        is_deterministic: true
      compute: azureml:my_compute_cluster

Therefore, when I submit pipeline_job with ml_client.jobs.create_or_update(job=pipeline_job, ...) the created job uses the default values of the settings (force_rerun = False, continue_on_step_failure: True, etc.) instead of the values specified in pipeline.yaml. (Note: I am aware that we can configure settings via pipeline_job.settings.force_rerun = True, but I want to use my own settings (specified in pipeline.yaml) by default when submitting a job and override them when needed.)

asos-rickbruins commented 8 months ago

The same issue occurs when I do a pipeline batch deployment.

$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
name: test-pipeline
description: Test
endpoint_name: test-endpoint
type: pipeline
component: ./pipeline.yaml
settings:
  continue_on_step_failure: False
  force_rerun: True 
leudom commented 2 months ago

I am facing the same issue! Are there any news on that?

TucoFernandes commented 2 weeks ago

Still having the same issue! Any news?

pezosanta commented 2 weeks ago

Guys, just do not confuse pipelineComponent and pipelineJob yaml schemas. Use pipelineComponent schema if you want to register a pipeline that later can be triggered by its version (e.g. via a pipelineJob schema yaml) and use pipelineJob schema if you want to create a pipeline job (so a pipeline that is being executed) from scratch or by referencing a registered pipeline.

Only pipelineJob schema has force_rerun, continue_on_step_failure etc. runtime attributes

also, for pipelineJob yamls, use load_job() (from azure.ai.ml) and for pipelineComponent yamls use load_component()