aimhubio / aim

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
https://aimstack.io
Apache License 2.0
4.93k stars 297 forks source link

How to store model checkpoints locally as artifacts #3144

Open sbuschjaeger opened 1 month ago

sbuschjaeger commented 1 month ago

❓Question

What is the intended workflow to locally store artifacts such as e.g. model checkpoints?

As far as I can see (https://aimstack.readthedocs.io/en/latest/using/artifacts.html) we can use run.log_artifact() to store artifacts that are already on disk and upload them somewhere. This makes sense for S3 / remote storage, but what about local storage? Is there a way to directly store something as an artifact on disk? Basically, I want to store a checkpoint of my model every n epochs.

Something along the line of (here for PyTorch):

log_path = self.run.get_this_from_somewhere() # Get path for current run
torch.save(self.state_dict(), os.path.join(log_path, f"model_{epoch}.pt"))
self.run.log_artifact(os.path.join(log_path, f"model_{epoch}.pt"), name=f"model_{epoch}.pt") # without upload
DavidoF3 commented 1 month ago

The ability to store artifacts locally would be very useful.

gpascale commented 3 weeks ago

Took a stab at this - #3156. Would love feedback.