Closed jcpayne closed 3 years ago
@jcpayne Thanks for reaching out. We're investigating this and will get back to you shortly.
@jcpayne thanks for reaching out. A default storage account is created along with your AML workspace and you can manage the data accordingly. Your experiment data is stored in a blob container in the default storage account and you can access it directly from your resource group or view your credentials in AML Studio > Datastores > workspaceblobstore (Default).
Yes thanks, I was aware of that. So is just deleting the blobstore folders for a run going to cause any problem? I.e., does the workspace somehow index those folders in a way that would cause problems if they were deleted?
Also, there are two folders and two files created for every run (a 'setup' folder; a main folder, a .zip file, and another non-zipped file), so deleting "Experiment 2, Run 4" requires locating and deleting 4 different things, which is made more awkward by the obscure file and folder names that are auto-generated; for example, AML Studio named my Experiment 2, Run 4 folder "expt2_1593466198_5ee83437." It would be nicer if it was just possible to click on an experiment in Studio and have all of the pieces found and deleted at once.
@jcpayne currently, AML doesn't support deleting experiments but this feature is on our roadmap. It is also not recommended to delete files in blob storage (you can but I just confirmed that it is not recommended as it could cause inconsistencies). However, you can use the following python code sample to delete runs and its artifcats (note- snapshot dir won't be deleted). Let me know if the below suggestion helps. Thanks.
import uuid import requests from azureml._base_sdk_common.user_agent import get_user_agent from azureml._base_sdk_common import _ClientSessionId from azureml.core.experiment import Experiment from azureml._common.exceptions import AzureMLException from azureml._restclient.clientbase import ClientBase
def delete_run(workspace, experiment_name, run_id):
"""
:param workspace:
:type workspace: azureml.core.workspace.Workspace
:param experiment_name: experiment name.
:type experiment_name: str
:param run_id: run id
:type run_id: str
:return:
"""
headers = {
"User-Agent": get_user_agent(),
"x-ms-client-session-id": _ClientSessionId,
"x-ms-client-request-id": str(uuid.uuid4())
}
# Merging the auth header.
headers.update(workspace._auth_object.get_authentication_header())
experiment = Experiment(workspace, experiment_name)
rh_workspace_scope = workspace.service_context._get_run_history_url() + "/history/v1.0/private" + workspace.service_context._get_workspace_scope()
delete_url = rh_workspace_scope + "/" + "experimentids/{}/runs/{}".format(experiment.id, run_id)
response = ClientBase._execute_func(requests.delete, delete_url, headers=headers)
if response.status_code >= 400:
from azureml._base_sdk_common.common import get_http_exception_response_string
# response.text is a JSON from execution service.
response_message = get_http_exception_response_string(response)
raise AzureMLException(response_message)
result = response.json()
print(result)
Thank you very much for this careful answer. I will give the code a try.
John
From: GiftA-MSFT notifications@github.com Reply-To: MicrosoftDocs/azure-docs reply@reply.github.com Date: Monday, August 10, 2020 at 2:52 PM To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com Cc: John Payne drjohnpayne@gmail.com, Mention mention@noreply.github.com Subject: Re: [MicrosoftDocs/azure-docs] Deleting experiments (#60501)
@jcpayne currently, AML doesn't support deleting experiments but this feature is on our roadmap. It is also not recommended to delete files in blob storage (you can but I just confirmed that it is not recommended as it could cause inconsistencies). However, you can use the following python code sample to delete runs and its artifcats (note- snapshot dir won't be deleted). Let me know if the below suggestion helps. Thanks.
import uuid import requests from azureml._base_sdk_common.user_agent import get_user_agent from azureml._base_sdk_common import _ClientSessionId from azureml.core.experiment import Experiment from azureml._common.exceptions import AzureMLException from azureml._restclient.clientbase import ClientBase def delete_run(workspace, experiment_name, run_id): """ :param workspace: :type workspace: azureml.core.workspace.Workspace :param experiment_name: experiment name. :type experiment_name: str :param run_id: run id :type run_id: str :return: """ headers = { "User-Agent": get_user_agent(), "x-ms-client-session-id": _ClientSessionId, "x-ms-client-request-id": str(uuid.uuid4()) }
# Merging the auth header. headers.update(workspace._auth_object.get_authentication_header()) experiment = Experiment(workspace, experiment_name) rh_workspace_scope = workspace.service_context._get_run_history_url() + "/history/v1.0/private" + workspace.service_context._get_workspace_scope() delete_url = rh_workspace_scope + "/" + "experimentids/{}/runs/{}".format(experiment.id, run_id) response = ClientBase._execute_func(requests.delete, delete_url, headers=headers) if response.status_code >= 400: from azureml._base_sdk_common.common import get_http_exception_response_string # response.text is a JSON from execution service. response_message = get_http_exception_response_string(response) raise AzureMLException(response_message) result = response.json() print(result) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
For other users: this snippet deletes all of the runs in a given experiment. Afterwards, the experiment name and the 'latest run' are still visible in Studio, but all of the associated storage has been cleared.
experiment_name = 'my_experiment'
exp = ws.experiments[experiment_name]
for run in exp.get_runs():
#print(run.id) #might want to check before deleting!
delete_run(ws, experiment_name, run.id)
Oddly enough, the script you suggested worked for the first experiment, but I tried it on 3 more experiments and while the printed message indicates that the files were removed, the Microsoft Azure Storage Explorer shows that they are still there (and not deleted).
Hi @jcpayne thanks for following up. Using the above code, I was able to delete the experiment runs of multiple experiments successfully. What folder are you looking at? Did you try to refresh Containers > azureml > ExperimentRun folder?
Oh, that’s interesting. I am temporarily locked out of my account so I can’t check anything, but as an example, in the second figure in this documentation page: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-export-delete-data, there are five top-level folders. The highlighted folder is ‘azureml’ and it contains the ‘ExperimentRun’ folder. But the folder below it, called ‘azureml-blobstore-89f54357-8ac2-etc.’ is (in my experience) the actual blobstore that is associated with the workspace. Inside that is a folder also called ‘azureml’, and inside that are the big run files like model checkpointing and run output, which can quickly eat up memory.
After I ran the script you provided, I did refresh the Azure Storage Explorer (including shutting it down and re-opening it). I didn’t check the ExperimentRun folder, but I did check the blobstore folder and I could still see all of the runs post-experiment1 (i.e., for experiments 2,3, 4, etc.). Is it possible that the blobstore folder only contains soft links to the ExperimentRun folder, and therefore what I’m seeing is actually broken links to nonexistent files?
One other comment: your warning not to delete files from storage seems to go against the advice on the page above, which says “Run history documents, which may contain personal user information, are stored in the storage account in blob storage, in subfolders of /azureml. You can download and delete the data from the portal.” It would be helpful if you could clarify what circumstances would cause inconsistencies in the workspace.
Thanks,
John
From: GiftA-MSFT notifications@github.com Reply-To: MicrosoftDocs/azure-docs reply@reply.github.com Date: Wednesday, August 26, 2020 at 12:23 PM To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com Cc: John Payne drjohnpayne@gmail.com, Mention mention@noreply.github.com Subject: Re: [MicrosoftDocs/azure-docs] Deleting experiments (#60501)
Hi @jcpayne thanks for following up. Using the above code, I was able to delete the experiment runs of multiple experiments successfully. What folder are you looking at? Did you try to refresh Containers > azureml > ExperimentRun folder?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi, thanks for following up. Unfortunately, we currently don't have an option to automatically identify and delete all data associated with an experiment run in the default blobstore. The code I provided above only deletes data in the azureml container. You can try to manually identify/delete the data in default blobstore, however, it is not recommended as you may no longer be able to use old snapshots or see old logs/models and this can cause weird errors. The product team are aware of this feature request, and plans to enable the option for deleting runs is on their roadmap. So sorry for the inconvenience. Thanks.
Thanks very much for that clarification, and for following up on my question.
John
From: GiftA-MSFT notifications@github.com Reply-To: MicrosoftDocs/azure-docs reply@reply.github.com Date: Wednesday, August 26, 2020 at 7:03 PM To: MicrosoftDocs/azure-docs azure-docs@noreply.github.com Cc: John Payne drjohnpayne@gmail.com, Mention mention@noreply.github.com Subject: Re: [MicrosoftDocs/azure-docs] Deleting experiments (#60501)
Hi, thanks for following up. Unfortunately, we currently don't have an option to automatically identify and delete all data associated with an experiment run in the default blobstore. The code I provided above only deletes data in the azureml container. You can try to manually identify/delete the data in default blobstore, however, it is not recommended as you may no longer be able to use old snapshots or see old logs/models and this can cause weird errors. The product team are aware of this feature request, and plans to enable the option for deleting runs is on their roadmap. So sorry for the inconvenience. Thanks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Has there been progress on this issue? Is there a way to delete storage information associated with an experiment yet?
I have recently started using Machine Learning services quite regularly and my costs on Azure have gone way up, in part (I think) because so much data is stored from every run. It is annoying that it appears to be so hard to delete experiments. The experiment is the unit at which deletion should be easiest. For example, I don't want to junk my whole workspace because I'm still working on the same problem, but I do want to get rid of gigabytes of data from 35 failed runs or whatever, so that I don't pay for them.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.