Open mh-hassan18 opened 8 months ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.
Hi @mh-hassan18 - Thanks for opening an issue! Tagging the right people to take a look.
I think the customer recreated the model monitor so the component version is upgraded. Could you please share your SubscriptionId and logs of the failed jobs? we can check the job failure, it may not be related to component version upgrade.
@yunjie-hub 1) yes we re-created the model monitor. 2) I can assure you that the job failure was because of component version upgrade because in the newer version of the code they added some conditions to check the path of your MLTable file, such conditions never existed in earlier version of the component and whatever MLTable file we had for our baseline datasets they were working fine but after the versions updated the same MLTable file was not working.
Here is the recent trace from the failed job:
shared_utilities/io_utils.py\", line 120, in _verify_mltable_paths\n raise InvalidInputError(f\"Invalid or unsupported path {path_val} in MLTable {mltable_path}
Here is the MLTable file that we were using before:
paths:
- file: baseline_data.csv
transformations:
- read_delimited:
delimiter: ','
empty_as_string: false
encoding: utf8
header: all_files_same_headers
include_path_column: false
infer_column_types: true
partition_size: 20971520
path_column: Path
support_multi_line: false
type: mltable
Here is the MLTable file that the new version expects.
paths:
- file: ./baseline_data.csv # It should be relative and should starts with './' (there are other valid types of paths as well, but this small change fixed our problem)
transformations:
- read_delimited:
delimiter: ','
empty_as_string: false
encoding: utf8
header: all_files_same_headers
include_path_column: false
infer_column_types: true
partition_size: 20971520
path_column: Path
support_multi_line: false
type: mltable
Here is the code chunk that validates that, this is located at:
model_data_collector_preprocessor -> store_url.py -> Class StoreUrl
def is_local_path(self) -> bool:
"""Check if the store url is a local path."""
if not self._base_url:
return False
return os.path.isdir(self._base_url) or os.path.isfile(self._base_url) or self._base_url.startswith("file://")\
or self._base_url.startswith("/") or self._base_url.startswith(".") \
or re.match(r"^[a-zA-Z]:[/\\]", self._base_url)
In the earlier version of the component this file does not even exists i.e.
There is no store_url.py file in model_data_collector_preprocessor folder
Now the point is that we did not change anything in the code, or data and we did not even change any library version still our code failed because behind the scenes these versions got updated. So is there any way we can control these versions so that we don't encounter such cases in future.
glad customer found the workaround, our fix is on the way, should be available some time next week.
So we are trying to setup some drift monitors for our models in our organization using SDK V2. Now the problem is that when we setup a drift monitor using SDK V2, we don't have any control on the components of drift monitor pipeline that is set up by azure. For instance once the drift monitor is setup we can see a pipeline running in azure portal which has the following components:
Model Data Collector - Preprocessor Data Drift - Signal Monitor Model Monitor - Create Manifest
Now on the pipeline it displays the version along side each component. The behavior that we have observed in the past couple of weeks is that these versions keep on changing and hence the code behind them which causes an inconsistent behavior. For instance, here are the component versions for drift monitors that were setup on two different dates:
Drift monitor that was setup on 26th January, 2024:
It was using Version 0.3.21 for "Data Drift - Signal Monitor"
Drift monitor that was setup on 29th January, 2024:
It was using Version 0.3.29 for "Data Drift - Signal Monitor"
We have observed inconsistent behavior when these versions change and apparently we do not have any control to fix these versions. Is there any way we can fix these versions?
To Reproduce: We are using standard code provided in the SDK V2 documentation to setup these monitors, which can be found here.
Expected behavior Components version should remain fixed no matter whenever we are setting up the drift monitors, this is necessary for consistent behavior or at-least we should be able to control these versions.
Screenshot of the successful run on 26th January, 2024:
Screenshot of the failed run on 29th January, 2024: