Open bishax opened 3 years ago
@bishax this is a valid call out. With resume
, a new run-id is minted for the workflow execution which is then passed to the S3 object (S3(run=self)
). In case of resuming a failed workflow, it may not be immediately obvious to Metaflow what is the right S3 object to construct - In many cases, it may well be desirable for S3(run=self)
to point to a versioned object for the current run id and not the origin run id. Would something of this sort work for you -
with S3(run=origin_run or current.run_id) as s3:
s3_obj = s3.get("mykey")
print(s3_obj.text)
I can see broadly see three ways to get the desired behaviour:
1) Modify the behaviour of S3
e.g. define a fallback S3._s3root
when S3
is constructed with run
and run.origin_run_id is not None
such that if get (or similar non-destructive actions) fail then they are tried again with an S3._s3root
corresponding to run.origin_run_id
.
2) When resume
is called copy the objects in metaflow/data/<flow name>/<origin run id>
to metaflow/data/<flow name>/<run id>
3) Rely on the user to not trip-up
Each solution has it's issues:
1) Requires changes across several methods of S3
, complicates the constructor further.
2) Has to be called/implemented in a way that is compatible with other datastores (current and future).
When you say...
Would something of this sort work for you -
with S3(run=origin_run or current.run_id) as s3: s3_obj = s3.get("mykey") print(s3_obj.text)
Do you mean something along the lines of 1 (Changing the behaviour of S3
) or 3 (Rely on me to pass in something different)?
When a flow that stores an object in s3 with
S3(run=self).put
fails and is later resumed, then the cloned results of the previous tasks do not extend to the aforementioned object. This results in subsequent calls toS3(run=self).get
to fail but I would expect resume to cover this case as it involves data versioned under a specific flow run.See below for a flow that reproduces this behaviour: