Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.63k stars 2.84k forks source link

Unable to add a data asset reference to the model in ML Studio with Python SDK #38513

Open TCodingB opened 1 week ago

TCodingB commented 1 week ago

Describe the bug When making a data asset reference to the model in ML Studio with the help of python SDK I'm unable to add a data asset of type MLTable (composed of a parquet file and it's complementary mltable definition), but there doesn't seem to be an issue adding a MLTable deriving from .csv file.

To Reproduce

  1. Register a model to the ML Studio
  2. Registering the .csv and parquet datasets as MLtable to the data assets in the ML Studio
  3. Getting the data assets per name:
    • Dataset.get_by_name(ws, name=data_asset_name)
    • Saving the data assets as reference_dataset_csv and reference_dataset_parquet
  4. Adding the data asset reference to the registered model
    • model.add_dataset_references([("dataset csv", reference_dataset_csv), ("dataset parquet", reference_dataset_parquet)])
  5. After running there is no errors that would signalise that there is anything wrong.

Expected behavior Both data assets to be referenced to the model in the ML Studio.

Screenshots .csv data asset: Image

.parquet data asset: Image

Model data references: Image

github-actions[bot] commented 1 week ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

TCodingB commented 1 week ago

Hi, we managed to find a better solution going with registering pandas dataframe as data asset to the ML Studio. Still I find it odd that you can't reference the data asset that is otherwise completely functional (can be downloded and processed etc.) but can't be referenced to the model. Another thing that was odd was, that after running the code, we didn't get any feedback (good or bad) that the dataset was or wasn't referenced to the model.

Thank you for taking the time,

Kind regards,

Tadej

jaga-work commented 1 day ago

Need sample files and model to investigate on this issue. @TCodingB kindly provide the same.

github-actions[bot] commented 1 day ago

Hi @TCodingB. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.