Delete tracked images/objects from runs

aimhubio / aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

https://aimstack.io

Apache License 2.0

4.94k stars 300 forks source link

Delete tracked images/objects from runs #2814

Open creinders opened 1 year ago

creinders commented 1 year ago

❓Question

Is it possible to delete tracked images (or objects in general) from runs? When tracking images or other large objects, the size of the tracked data can grow very extensive, and it would be interesting the remove images afterward. Often, only a small subset of the tracked images is still needed for a longer time after verifying the training pipeline, and much data can be deleted.

The API described here https://aimstack.readthedocs.io/en/latest/using/query_runs.html could have a delete() option on the Image object.

mihran113 commented 1 year ago

Hey @creinders! There's a workaround to delete sequences from runs via SDK. You can have a look at it here: https://discord.com/channels/1047782741736968202/1093472675470528562/1095733594963587192

We'll try to add a much better way to delete metrics(sequences) via SDK and from UI with the next major release.

creinders commented 1 year ago

Thank you for your response! I copied the code snippet.

from aim import Run
from aim.sdk.context import Context
run_hash = 'desired_run_hash'
metric_name = 'metric_name_to_delete'
ctx_id = Context({'some': 'context'}).idx

run = Run(repo='path_to_your_repo', read_only=False)
del run.meta_run_tree[('traces', ctx_id, metric_name)]
del run.series_run_trees[2][(ctx_id, metric_name)]

The run_hash variable in the example is not used. Is it supposed to be run = Run(repo=repo, run_hash=run_hash, read_only=False). However, it is raising an error when opening the run: TypeError: __init__() missing 1 required positional argument: 'lock_file' (I am using a remote server).

Thank you for providing a new API in the next release!

abschm commented 10 months ago

I also get that exact error when trying to continue a run from a remote server, any idea why it happens? I am trying to use Aim with a container engine (Argo workflows), so continuing runs that were started in a previous container is essential to me.