Open julianschoep opened 1 year ago
Hi @julianschoep , can you provide an example of how you're setting it in code? (or in configuration)
Sure:
writer = SummaryWriter(log_dir=experiment_directory)
task = Task.init(
project_name="project_name",
task_name=task_name,
output_uri="s3://models",
continue_last_task=True,
tags=tags,
)
decoder = Decoder(latent_size=latent_size, **specs["NetworkSpecs"])
decoder = ModelWrapper(decoder)
decoder = decoder.to(device)
.. train loop ...
if epoch % log_frequency == 0:
task.upload_artifact("sample_0", artifact_object=artifact_path)
state_dict = {
"decoder": decoder.state_dict(),
"epoch": epoch,
"optimizer": optimizer_all.state_dict(),
"latents": lat_vecs.state_dict(),
}
state_path = experiment_directory / f"model_e{epoch}.pt"
torch.save(state_dict, state_path)
With this snippet I do get the sample_0
artifact under Artifacts / Other
, but no OutputModel
as would be expected.
Perhaps a clue, I'm now trying it manually by defining an OutputModel
object and using update_weights
, and I got an error as my weight_path was a pathlib.Path
object instead of a str (has no attribute .lower()
). Could that be the reason? I have no idea how the auto-model uploading works under the hood :p does it search the disk or does it wrap the torch.save
somehow?
If you call upload_artifact
directly, it will use what you provided (in which case it does need to be an str, I think).
There's also automatic wrapping of torch.save.
What are you seeing now?
Had the same issue lately, I used setup_aws_upload()
explicitly after my task.init()
as a workaround:
task.setup_aws_upload(
bucket='models',
region='us-east-1'
)
Ultimately I worked around it via
task = clearml.Task.init(output_uri="s3://bucket")
output_model = clearml.OutputModel(task=task)
output_model.update_weights(str(state_path))
There's also automatic wrapping of torch.save.
Didn't there used to be? I remember not having to define this and the model files were automatically backed up.
I've set an output_uri to S3, and am able to upload custom artifacts without problems. My models are however not uploaded. The documentation states that it will "automatically" upload models if an output_uri is specified and the frameworks' model storing is used (e.g. I'm using
pytorch
, and save the model withtorch.save
and with file-extension.pt
). Yet there are no "InputModels" or "OutputModels" present in the artifacts tab, only my own custom artifacts.Are there any other things required for the model saving to be automatically picked up? Is there a file-naming convention, or should the
.pt
files be saved in a specific directory to be picked up? Or does it simply upload everything that is saved withtorch.save
toOutputModels
?Environment