Closed sergey-ivanchuk closed 1 year ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.
Author: | sergey-ivanchuk |
---|---|
Assignees: | - |
Labels: | `Machine Learning`, `Service Attention`, `customer-reported`, `feature-request`, `needs-triage`, `question` |
Milestone: | - |
Thanks for the feedback, we’ll investigate asap.
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @shbijlan.
Author: | sergey-ivanchuk |
---|---|
Assignees: | - |
Labels: | `ADO`, `ML-Pipelines`, `Machine Learning`, `Service Attention`, `customer-reported`, `feature-request` |
Milestone: | - |
@sergey-ivanchuk Apologies for the late reply. We are looking into this issue and we will provide an update once we have more details on this.
@bandsina @shbijlan @likebupt Could you please look into this and provide an update once you get a chance ? Awaiting your reply.
@sergey-ivanchuk Thanks for your feedback. This is a valid scenario. As we are developing new SDK version, I will add this request to the backlog. For this old SDK version, we will not do a new investment on it.
From my understanding, you will use a single big repo to manage the pipeline, and steps in it. And when you built pipeline and steps you will use root folder for this repo. By default, we will use the whole folder to calculate the code hash to decide re-use. In this scenario, step2 changes will impact the step1 re-use verse wise.
Provide capability to let customer provide the folders want to use for calculate code hash, will also introduce some issues, for example, in your case, only provide step_1 for hash will not be sufficient, as step_1 will also depends on src. So we will think this is advance use scenario we need to support.
hi everyone, thanks for your recent follow-ups.
@cloga , follow-up comments below:
From my understanding, you will use a single big repo to manage the pipeline, and steps in it. And when you built pipeline and steps you will use root folder for this repo. By default, we will use the whole folder to calculate the code hash to decide re-use. In this scenario, step2 changes will impact the step1 re-use verse wise.
Yes, exactly.
Hypothetically, I could have a 5-step process and only want to re-run steps 5 (model training)
Provide capability to let customer provide the folders want to use for calculate code hash, will also introduce some issues, for example, in your case, only provide step_1 for hash will not be sufficient, as step_1 will also depends on src. So we will think this is advance use scenario we need to support.
Very good call-out. I would ideally wish to import from src
and then hash only on step_2
. Hopefully this could be feasible.
@cloga please add this feature request to the proper backlog. I'm closing this issue for now.
Cross post from https://github.com/Azure/azure-sdk-for-python/issues/18182#issuecomment-829727066
Is your feature request related to a problem? Please describe.
For future releases, I'd like to see the return of an old, deprecated feature in the Azure Python SDK.
It would be great to use
azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...)
. This parameter was depreciated a long time ago, but I feel it would benefit the Azure SDK user community.Below is a use case I have, and a use case that's fairly practical for certain situations.
From my two goals above, I have them within a repository with source and pipeline code to run.
For goal 1 , I want to import
src
code. So, I need to makesource_directory='./../'
in thePythonScriptStep
functionFor goal 2, I want to use
allow_reuse=True
andhash_paths = './pipeline/step_1'
so that I can do hashing on multiple sub-steps in a pipeline (e.g. use case where I need to re-runstep_2
but still re-usestep_1
).In reality, I might have 6 sub-steps in a repository. So, the value of
hash_paths
goes up greatly. Only re-running 1-of-6 steps is much better than re-running 6-of-6Describe the solution you'd like
Un-depreciate
azureml.pipeline.steps.python_script_step. PythonScriptStep(hash_paths= ...)
Describe alternatives you've considered
From my code snippet, I have considered splitting all code into two repositories (
src
andpipelines
). This will meet my goal # 1 and goal # 2 from above. However, this will require more workarounds than I'd like to be responsible for. So, the code management side will be more than necessary .azureml.pipeline.steps.python_script_step.PythonScriptStep(hash_paths= ...)
will give greater control and leverage for re-using certain pipeline steps.Additional context Nothing more to add.