aws-samples / aws-sagemaker-build

Creates a CloudFormation template that uses AWS StepFunctions to automate the building and training of Sagemaker custom models based on S3 and GitHub events
Apache License 2.0
165 stars 44 forks source link

Support Multiple Training Jobs at the same time #22

Closed oelesinsc24 closed 5 years ago

JohnCalhoun commented 5 years ago

could you explain more about what you would like to see?

oelesinsc24 commented 5 years ago

AFAIK, I have to update the stack with ConfigFramework to deploy a new model. See below:

params=result["Stacks"][0]["Parameters"]
for n,i in enumerate(params):
    if(i["ParameterKey"]=="ConfigFramework"):
        i["ParameterValue"]="MXNET" 

try:
    cf.update_stack(
        StackName=StackName,
        UsePreviousTemplate=True,
        Parameters=params,
        Capabilities=[
            'CAPABILITY_NAMED_IAM',
        ]
    )
    waiter = cf.get_waiter('stack_update_complete')
    print("Waiting for stack update")
    waiter.wait(
        StackName=StackName,
        WaiterConfig={
            'Delay':10,
            'MaxAttempts':600
        }
    )

except ClientError as e:
    if(e.response["Error"]["Message"]=="No updates are to be performed."):
        pass
    else:
        raise e
print("stack ready!")

This means that every model training job updates the stack which would hinder a new job once the stack update is in progress. I know a workaround could be to have different stacks for each ConfigFramework as this helps to isolate problems.

My question is would it be possible to make use of the stack for multiple ConfigFrameworks at the same time.

Thanks in advance

oelesinsc24 commented 5 years ago

This question is answered already by deploying multiple stacks.

Thanks @JohnCalhoun