aws / sagemaker-experiments

Experiment tracking and metric logging for Amazon SageMaker notebooks and model training.
Apache License 2.0
125 stars 36 forks source link

Recommended way to transfer experiments between accounts/ access experiment data #132

Closed jmahlik closed 3 years ago

jmahlik commented 3 years ago

Is your feature request related to a problem? Please describe. Is there a recommended way to transfer experiment data between aws accounts? We'd like to move experiment data and ideally the experiments themselves between multiple accounts. I couldn't find docs on where the underlying data is actually stored. I'd imagine it is some kind of database but the api docs don't go in to detail.

Having access to this data is quite important for our use case since there are stringent retention requirements. All of the model development info is stored in experiments, so not being able to access or move it around directly is a major roadblock.

Describe the solution you'd like It would be great to get more information on where the underlying data is stored and how to query it in a raw form.

Describe alternatives you've considered We have considered attempting to recreate experiments by "replaying" them in a new aws account but it doesn't seem like this is a great option. Essentially, we'd have to iterate over all the relevant items i.e. trial components and their metrics in all of the experiments then recreate/relog everything. Seems prone to error and some things can't be created directly.

It may be possible to use the boto list/search apis then create apis to recreate them but there doesn't appear to be a clean way to accomplish this without creating trackers and logging metrics.

Something like SELECT * FROM experiments_table then INSERT INTO experiments_table_in_another_account VALUES ... would be the ideal solution.

I realize one pitfall is the training job names would not exist in a different account. That's not really a big issue for this use case.

Any thoughts appreciated.

danabens commented 3 years ago

There is not really a method to do this. As you point out it is possible to approximate the experiment/trail/trial components in another account except for the metric data which is derived from the source job during training.

I would start from your higher-level use case. i.e. why do you need to copy data between accounts in the first place? Perhaps establishing a process of versioning experiment notebooks or using another tool such as SageMaker pipelines would help.

On a related note pipelines recently launched experiments integration.