When following the tutorial for the built-in model and deploying in eu-central-1 (Frankfurt), the lambda function in
/aws/lambda/MLOps-BIA-TrainModel-pva fails with:
...
[INFO]Container Path 811284229777.dkr.ecr.us-east-1.amazonaws.com/xgboost:1
...
An error occurred (ValidationException) when calling the CreateTrainingJob operation: Invalid DNS suffix 'amazonaws.com' for region 'us-east-1' in training image. Please provide the valid <region>.<dns-suffix>: 'eu-central-1.amazonaws.com'
I presume this was caused by an incorrect value supplied for the environment variable
ecr_path = os.environ['AlgoECR']
As a proof of this, when I forced the value of ecr_path to be the correct path for eu-central-1, with the code below (adapted in the lambda function), it works:
#Get ECR information for BIA
algo_version = user_param['Algorithm']
#ecr_path = os.environ['AlgoECR']
# HARD CODE OVERRIDE by peter_v
ecr_path = '813361260812.dkr.ecr.eu-central-1.amazonaws.com'
container_path = ecr_path + '/' + algo_version
print('[INFO]Container Path', container_path)
When following the tutorial for the built-in model and deploying in
eu-central-1
(Frankfurt), the lambda function in/aws/lambda/MLOps-BIA-TrainModel-pva
fails with:I presume this was caused by an incorrect value supplied for the environment variable
ecr_path = os.environ['AlgoECR']
at line https://github.com/aws-samples/amazon-sagemaker-devops-with-ml/blob/abac90b15b438f00c0deab4470cf162410c5d600/1-Built-In-Algorithm/lambda-code/MLOps-BIA-TrainModel.py#L70
As a proof of this, when I forced the value of ecr_path to be the correct path for eu-central-1, with the code below (adapted in the lambda function), it works:
I got that specific value from this page
https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html
for the
XGBoost
algorithm ineu-central-1
.Maybe there is a way to set the environment variable
AlgoECR
value correctly, but I did not see that immediately in the tutorial README ...