Support Multiple Training Jobs at the same time

JohnCalhoun commented 5 years ago

could you explain more about what you would like to see?

oelesinsc24 commented 5 years ago

AFAIK, I have to update the stack with ConfigFramework to deploy a new model. See below:

params=result["Stacks"][0]["Parameters"]
for n,i in enumerate(params):
    if(i["ParameterKey"]=="ConfigFramework"):
        i["ParameterValue"]="MXNET" 

try:
    cf.update_stack(
        StackName=StackName,
        UsePreviousTemplate=True,
        Parameters=params,
        Capabilities=[
            'CAPABILITY_NAMED_IAM',
        ]
    )
    waiter = cf.get_waiter('stack_update_complete')
    print("Waiting for stack update")
    waiter.wait(
        StackName=StackName,
        WaiterConfig={
            'Delay':10,
            'MaxAttempts':600
        }
    )

except ClientError as e:
    if(e.response["Error"]["Message"]=="No updates are to be performed."):
        pass
    else:
        raise e
print("stack ready!")

This means that every model training job updates the stack which would hinder a new job once the stack update is in progress. I know a workaround could be to have different stacks for each ConfigFramework as this helps to isolate problems.

My question is would it be possible to make use of the stack for multiple ConfigFrameworks at the same time.

Thanks in advance

oelesinsc24 commented 5 years ago

This question is answered already by deploying multiple stacks.

Thanks @JohnCalhoun

aws-samples / aws-sagemaker-build

Support Multiple Training Jobs at the same time #22