Open grzjab opened 1 year ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github @Azure/azure-ml-sdk.
Thanks for reaching out @grzjab! Looping in @azureml-github to investigate
@grzjab , thanks for your interest in this. Currently we don't support using latest label when define a AzureML schedule pipeline. The problem here is it's unclear whether user want to use the latest version or a determined version source when define the schedule.
@eniac871 That sounds rather like a documentation question than a question for the feature? You can easily distinct those two cases.
I think this feature would be very beneficial. In most scenarios you dont want to run the exact same pipeline with the exact same data again and again.
The only way I am aware that you can use scheduled jobs that actually produce new output is by abusing that aml versioning as it is only referencing folders and you can change the underlying data. So you need to overwrite the current data by a sourcing job rendering the versioning completly useless.
@eniac871 that's why I have created a feature request to support using latest label version. As @VincBar wrote, running the pipeline with the same dataset versions doesn't make sense. In the V1 version there is an option to use latest dataset (label = Use always latest)
I'm also keenly interested in this discussion. With the current implementation of schedules for data import jobs, it appears that the entire MLOps pipeline can now be seamlessly managed within the AML Studio. This eliminates the necessity for external tools such as ADF, GitHub Actions, or Azure DevOps pipelines. However, I've identified two critical features that seem to be missing:
raw_data:
type: mltable
path: azureml:TrainingData@latest
This feature provides flexibility for users who wish to schedule jobs with dynamic data source versioning. They can specify the @latest
tag in their YAML configuration to automatically use the latest version of the data source. Conversely, if users need to schedule jobs with a specific version of the data source, they can directly specify that version in the job configuration.
@eniac871, given your expertise, I believe you can provide valuable insights into distinguishing between these two behaviors more effectively. Specifically, how the current implementation facilitates both dynamic source versioning and the potential for model deployment scheduling.
Hi. Are there any updates on the status of this issue?
I have the same issues when I need to rerun a pipeline using a scheduler that some of its dependent input data assets have been updated following the creation of the scheduler which it is not able to pick up on since the use of "latest" version of the data assets from trigger to trigger event is currently not available. This feature would be crucial in our setup using Azure Machine Learning scheduled ML pipelines.
Any update on this? Need this feature to automate training code. Majority people who want to schedule pipeline will want to use the latest version of their data-asset.
Is your feature request related to a problem? Please describe. When scheduling the AzureML pipeline job there is no possibility to use latest version of data asset in the moment of pipeline triggering, only latest version during pipeline creation/ modification can be defined.
The example provided https://learn.microsoft.com/en-gb/azure/machine-learning/how-to-schedule-pipeline-job?view=azureml-api-2&tabs=python#change-runtime-settings-when-defining-schedule allows for specyfing input argument of type azure.ai.ml.Input (https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.input?view=azure-python) but without the option to select latest version to be used when the pipeline is triggered, not when the pipeline is created.
Code of the example
Describe the solution you'd like Add special option that allows selecting always latest version of the data asset.
Additional context The same issue visible when creating the pipeline using UI. Selecting version 6 (latest) will fix it and in future when new version are available, still version 6 will be used.