MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.2k stars 21.36k forks source link

Update the content of a Dataset in AzureML studio #119411

Open sjLeonGalan opened 7 months ago

sjLeonGalan commented 7 months ago

In the execution of a job in my MLOps pipeline, I want to obtain a dataset's content, modify it, and upload the new content, overwriting the old content of the file. Is there any option to do it? I have tried different forms, such as using Dataset.Tabular.register_pandas_Dataframe, but it creates a new directory for the new file, and in my case, I need to replace the existing file.

from azureml.core import Dataset
dataset_test = Dataset.Tabular.from_parquet_files(path = [(ds, 'main.parquet')])
file_dataframe = dataset_test.to_pandas_dataframe()
file_path = 'test_dir/main.parquet'

file_dataframe['column'] = 'new_value'

file_dataset = Dataset.Tabular.register_pandas_dataframe(
    dataframe=file_dataframe,
    target=(ds, file_path),
    name='main.parquet',
    description='Test upload new file'
)

--

Detalles del documento

No edite esta sección. Se requiere para learn.microsoft.com ➟ Vinculación de problema de GitHub.

AjayBathini-MSFT commented 7 months ago

@sjLeonGalan It would be great if you could add a link to the documentation you are following for these steps? This would help us redirect the issue to the appropriate team. Thanks!!

sjLeonGalan commented 7 months ago

Im following this doc: https://learn.microsoft.com/es-es/python/api/azureml-core/azureml.core.dataset(class)?view=azure-ml-py #

AjayBathini-MSFT commented 7 months ago

@sjLeonGalan Thanks for your feedback! I've assigned this issue to the author who will investigate and update as appropriate.