Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.6k stars 2.81k forks source link

Azure SDK V2 Drift montor pipeline component versions are inconsistent #34398

Open mh-hassan18 opened 8 months ago

mh-hassan18 commented 8 months ago

So we are trying to setup some drift monitors for our models in our organization using SDK V2. Now the problem is that when we setup a drift monitor using SDK V2, we don't have any control on the components of drift monitor pipeline that is set up by azure. For instance once the drift monitor is setup we can see a pipeline running in azure portal which has the following components:

Model Data Collector - Preprocessor Data Drift - Signal Monitor Model Monitor - Create Manifest

Now on the pipeline it displays the version along side each component. The behavior that we have observed in the past couple of weeks is that these versions keep on changing and hence the code behind them which causes an inconsistent behavior. For instance, here are the component versions for drift monitors that were setup on two different dates:

Drift monitor that was setup on 26th January, 2024:

It was using Version 0.3.21 for "Data Drift - Signal Monitor"

Drift monitor that was setup on 29th January, 2024:

It was using Version 0.3.29 for "Data Drift - Signal Monitor"

We have observed inconsistent behavior when these versions change and apparently we do not have any control to fix these versions. Is there any way we can fix these versions?

To Reproduce: We are using standard code provided in the SDK V2 documentation to setup these monitors, which can be found here.

Expected behavior Components version should remain fixed no matter whenever we are setting up the drift monitors, this is necessary for consistent behavior or at-least we should be able to control these versions.

Screenshot of the successful run on 26th January, 2024: successful_run_26th_jan_2024

Screenshot of the failed run on 29th January, 2024: failed_run_29th_jan_2024

github-actions[bot] commented 8 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

swathipil commented 8 months ago

Hi @mh-hassan18 - Thanks for opening an issue! Tagging the right people to take a look.

yunjie-hub commented 8 months ago

I think the customer recreated the model monitor so the component version is upgraded. Could you please share your SubscriptionId and logs of the failed jobs? we can check the job failure, it may not be related to component version upgrade.

mh-hassan18 commented 8 months ago

@yunjie-hub 1) yes we re-created the model monitor. 2) I can assure you that the job failure was because of component version upgrade because in the newer version of the code they added some conditions to check the path of your MLTable file, such conditions never existed in earlier version of the component and whatever MLTable file we had for our baseline datasets they were working fine but after the versions updated the same MLTable file was not working.

Here is the recent trace from the failed job:

shared_utilities/io_utils.py\", line 120, in _verify_mltable_paths\n raise InvalidInputError(f\"Invalid or unsupported path {path_val} in MLTable {mltable_path}

Here is the MLTable file that we were using before:

paths:
- file: baseline_data.csv
transformations:
- read_delimited:
    delimiter: ','
    empty_as_string: false
    encoding: utf8
    header: all_files_same_headers
    include_path_column: false
    infer_column_types: true
    partition_size: 20971520
    path_column: Path
    support_multi_line: false
type: mltable

Here is the MLTable file that the new version expects.

paths:
- file: ./baseline_data.csv # It should be relative and should starts with './' (there are other valid types of paths as well, but this small change fixed our problem)
transformations:
- read_delimited:
    delimiter: ','
    empty_as_string: false
    encoding: utf8
    header: all_files_same_headers
    include_path_column: false
    infer_column_types: true
    partition_size: 20971520
    path_column: Path
    support_multi_line: false
type: mltable

Here is the code chunk that validates that, this is located at:

model_data_collector_preprocessor -> store_url.py -> Class StoreUrl

def is_local_path(self) -> bool:
        """Check if the store url is a local path."""
        if not self._base_url:
            return False
        return os.path.isdir(self._base_url) or os.path.isfile(self._base_url) or self._base_url.startswith("file://")\
            or self._base_url.startswith("/") or self._base_url.startswith(".") \
            or re.match(r"^[a-zA-Z]:[/\\]", self._base_url)

In the earlier version of the component this file does not even exists i.e.

There is no store_url.py file in model_data_collector_preprocessor folder

Now the point is that we did not change anything in the code, or data and we did not even change any library version still our code failed because behind the scenes these versions got updated. So is there any way we can control these versions so that we don't encounter such cases in future.

RichardLi1437 commented 8 months ago

glad customer found the workaround, our fix is on the way, should be available some time next week.