MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.09k stars 21.14k forks source link

Deleting experiments #60501

Closed jcpayne closed 3 years ago

jcpayne commented 3 years ago

I have recently started using Machine Learning services quite regularly and my costs on Azure have gone way up, in part (I think) because so much data is stored from every run. It is annoying that it appears to be so hard to delete experiments. The experiment is the unit at which deletion should be easiest. For example, I don't want to junk my whole workspace because I'm still working on the same problem, but I do want to get rid of gigabytes of data from 35 failed runs or whatever, so that I don't pay for them.

Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

BhargaviAnnadevara commented 3 years ago

@jcpayne Thanks for reaching out. We're investigating this and will get back to you shortly.

GiftA-MSFT commented 3 years ago

@jcpayne thanks for reaching out. A default storage account is created along with your AML workspace and you can manage the data accordingly. Your experiment data is stored in a blob container in the default storage account and you can access it directly from your resource group or view your credentials in AML Studio > Datastores > workspaceblobstore (Default).

jcpayne commented 3 years ago

Yes thanks, I was aware of that. So is just deleting the blobstore folders for a run going to cause any problem? I.e., does the workspace somehow index those folders in a way that would cause problems if they were deleted?

Also, there are two folders and two files created for every run (a 'setup' folder; a main folder, a .zip file, and another non-zipped file), so deleting "Experiment 2, Run 4" requires locating and deleting 4 different things, which is made more awkward by the obscure file and folder names that are auto-generated; for example, AML Studio named my Experiment 2, Run 4 folder "expt2_1593466198_5ee83437." It would be nicer if it was just possible to click on an experiment in Studio and have all of the pieces found and deleted at once.

GiftA-MSFT commented 3 years ago

@jcpayne currently, AML doesn't support deleting experiments but this feature is on our roadmap. It is also not recommended to delete files in blob storage (you can but I just confirmed that it is not recommended as it could cause inconsistencies). However, you can use the following python code sample to delete runs and its artifcats (note- snapshot dir won't be deleted). Let me know if the below suggestion helps. Thanks.

import uuid import requests from azureml._base_sdk_common.user_agent import get_user_agent from azureml._base_sdk_common import _ClientSessionId from azureml.core.experiment import Experiment from azureml._common.exceptions import AzureMLException from azureml._restclient.clientbase import ClientBase

def delete_run(workspace, experiment_name, run_id):
    """
    :param workspace:
    :type workspace: azureml.core.workspace.Workspace
    :param experiment_name: experiment name.
    :type experiment_name: str
    :param run_id: run id
    :type run_id: str
    :return:
    """

    headers = {
        "User-Agent": get_user_agent(),
        "x-ms-client-session-id": _ClientSessionId,
        "x-ms-client-request-id": str(uuid.uuid4())
    }

    # Merging the auth header.
    headers.update(workspace._auth_object.get_authentication_header())
    experiment = Experiment(workspace, experiment_name)
    rh_workspace_scope = workspace.service_context._get_run_history_url() + "/history/v1.0/private" + workspace.service_context._get_workspace_scope()
    delete_url = rh_workspace_scope + "/" + "experimentids/{}/runs/{}".format(experiment.id, run_id)
    response = ClientBase._execute_func(requests.delete, delete_url, headers=headers)

    if response.status_code >= 400:
        from azureml._base_sdk_common.common import get_http_exception_response_string
        # response.text is a JSON from execution service.
        response_message = get_http_exception_response_string(response)
        raise AzureMLException(response_message)
    result = response.json()
    print(result)
jcpayne commented 3 years ago

Thank you very much for this careful answer.  I will give the code a try.

John

From: GiftA-MSFT notifications@github.com Reply-To: MicrosoftDocs/azure-docs reply@reply.github.com Date: Monday, August 10, 2020 at 2:52 PM To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com Cc: John Payne drjohnpayne@gmail.com, Mention mention@noreply.github.com Subject: Re: [MicrosoftDocs/azure-docs] Deleting experiments (#60501)

@jcpayne currently, AML doesn't support deleting experiments but this feature is on our roadmap. It is also not recommended to delete files in blob storage (you can but I just confirmed that it is not recommended as it could cause inconsistencies). However, you can use the following python code sample to delete runs and its artifcats (note- snapshot dir won't be deleted). Let me know if the below suggestion helps. Thanks.

import uuid import requests from azureml._base_sdk_common.user_agent import get_user_agent from azureml._base_sdk_common import _ClientSessionId from azureml.core.experiment import Experiment from azureml._common.exceptions import AzureMLException from azureml._restclient.clientbase import ClientBase def delete_run(workspace, experiment_name, run_id):     """     :param workspace:     :type workspace: azureml.core.workspace.Workspace     :param experiment_name: experiment name.     :type experiment_name: str     :param run_id: run id     :type run_id: str     :return:     """         headers = {         "User-Agent": get_user_agent(),         "x-ms-client-session-id": _ClientSessionId,         "x-ms-client-request-id": str(uuid.uuid4())     }

    # Merging the auth header.     headers.update(workspace._auth_object.get_authentication_header())     experiment = Experiment(workspace, experiment_name)     rh_workspace_scope = workspace.service_context._get_run_history_url() + "/history/v1.0/private" + workspace.service_context._get_workspace_scope()     delete_url = rh_workspace_scope + "/" + "experimentids/{}/runs/{}".format(experiment.id, run_id)     response = ClientBase._execute_func(requests.delete, delete_url, headers=headers)         if response.status_code >= 400:         from azureml._base_sdk_common.common import get_http_exception_response_string         # response.text is a JSON from execution service.         response_message = get_http_exception_response_string(response)         raise AzureMLException(response_message)     result = response.json()     print(result) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jcpayne commented 3 years ago

For other users: this snippet deletes all of the runs in a given experiment. Afterwards, the experiment name and the 'latest run' are still visible in Studio, but all of the associated storage has been cleared.

experiment_name = 'my_experiment'  
exp = ws.experiments[experiment_name]  
for run in exp.get_runs():  
    #print(run.id) #might want to check before deleting!
    delete_run(ws, experiment_name, run.id)
jcpayne commented 3 years ago

Oddly enough, the script you suggested worked for the first experiment, but I tried it on 3 more experiments and while the printed message indicates that the files were removed, the Microsoft Azure Storage Explorer shows that they are still there (and not deleted).

GiftA-MSFT commented 3 years ago

Hi @jcpayne thanks for following up. Using the above code, I was able to delete the experiment runs of multiple experiments successfully. What folder are you looking at? Did you try to refresh Containers > azureml > ExperimentRun folder?

jcpayne commented 3 years ago

Oh, that’s interesting.  I am temporarily locked out of my account so I can’t check anything, but as an example, in the second figure in this documentation page: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-export-delete-data, there are five top-level folders.  The highlighted folder is ‘azureml’ and it contains the ‘ExperimentRun’ folder.  But the folder below it, called ‘azureml-blobstore-89f54357-8ac2-etc.’ is (in my experience) the actual blobstore that is associated with the workspace.  Inside that is a folder also called ‘azureml’, and inside that are the big run files like model checkpointing and run output, which can quickly eat up memory. 

After I ran the script you provided, I did refresh the Azure Storage Explorer (including shutting it down and re-opening it).  I didn’t check the ExperimentRun folder, but I did check the blobstore folder and I could still see all of the runs post-experiment1 (i.e., for experiments 2,3, 4, etc.).  Is it possible that the blobstore folder only contains soft links to the ExperimentRun folder, and therefore what I’m seeing is actually broken links to nonexistent files?

One other comment: your warning not to delete files from storage seems to go against the advice on the page above, which says “Run history documents, which may contain personal user information, are stored in the storage account in blob storage, in subfolders of /azureml. You can download and delete the data from the portal.”  It would be helpful if you could clarify what circumstances would cause inconsistencies in the workspace.

Thanks,

John

From: GiftA-MSFT notifications@github.com Reply-To: MicrosoftDocs/azure-docs reply@reply.github.com Date: Wednesday, August 26, 2020 at 12:23 PM To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com Cc: John Payne drjohnpayne@gmail.com, Mention mention@noreply.github.com Subject: Re: [MicrosoftDocs/azure-docs] Deleting experiments (#60501)

Hi @jcpayne thanks for following up. Using the above code, I was able to delete the experiment runs of multiple experiments successfully. What folder are you looking at? Did you try to refresh Containers > azureml > ExperimentRun folder?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

GiftA-MSFT commented 3 years ago

Hi, thanks for following up. Unfortunately, we currently don't have an option to automatically identify and delete all data associated with an experiment run in the default blobstore. The code I provided above only deletes data in the azureml container. You can try to manually identify/delete the data in default blobstore, however, it is not recommended as you may no longer be able to use old snapshots or see old logs/models and this can cause weird errors. The product team are aware of this feature request, and plans to enable the option for deleting runs is on their roadmap. So sorry for the inconvenience. Thanks.

jcpayne commented 3 years ago

Thanks very much for that clarification, and for following up on my question.

John

From: GiftA-MSFT notifications@github.com Reply-To: MicrosoftDocs/azure-docs reply@reply.github.com Date: Wednesday, August 26, 2020 at 7:03 PM To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com Cc: John Payne drjohnpayne@gmail.com, Mention mention@noreply.github.com Subject: Re: [MicrosoftDocs/azure-docs] Deleting experiments (#60501)

Hi, thanks for following up. Unfortunately, we currently don't have an option to automatically identify and delete all data associated with an experiment run in the default blobstore. The code I provided above only deletes data in the azureml container. You can try to manually identify/delete the data in default blobstore, however, it is not recommended as you may no longer be able to use old snapshots or see old logs/models and this can cause weird errors. The product team are aware of this feature request, and plans to enable the option for deleting runs is on their roadmap. So sorry for the inconvenience. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

nikhilweee commented 1 month ago

Has there been progress on this issue? Is there a way to delete storage information associated with an experiment yet?