aws-samples / amazon-sagemaker-safe-deployment-pipeline

Safe blue/green deployment of Amazon SageMaker endpoints using AWS CodePipeline, CodeBuild and CodeDeploy.
https://aws.amazon.com/blogs/machine-learning/safely-deploying-and-monitoring-amazon-sagemaker-endpoints-with-aws-codepipeline-and-aws-codedeploy/
MIT No Attribution
103 stars 239 forks source link

feat: Add native CloudFormation support for MonitoringSchedule #10

Closed brightsparc closed 3 years ago

brightsparc commented 4 years ago

Update the production deployment template to use the native MonitoringSchedule instead of the custom CloudFormation resource.

  SagemakerMonitoringSchedule:
    Type: AWS::SageMaker::MonitoringSchedule
    Properties:
      EndpointName: !GetAtt Endpoint.EndpointName
      MonitoringScheduleArn: String
      MonitoringScheduleConfig:
        MonitoringJobDefinition:
          BaselineConfig:
            ConstraintsResource:
              S3Uri: !Sub s3://sagemaker-${AWS::Region}-${AWS::AccountId}/${ModelName}/monitoring/baseline/mlops-${ModelName}-pbl-${TrainJobId}/constraints.json
            StatisticsResource:
              S3Uri: !Sub s3://sagemaker-${AWS::Region}-${AWS::AccountId}/${ModelName}/monitoring/baseline/mlops-${ModelName}-pbl-${TrainJobId}/statistics.json
          MonitoringAppSpecification:
            ImageUri:
              !FindInMap [ModelAnalyzerMap, !Ref "AWS::Region", "ImageUri"]
          MonitoringInputs:
            - EndpointInput:
                EndpointName: !GetAtt Endpoint.EndpointName
                LocalPath: "/opt/ml/processing/endpointdata"
          MonitoringOutputConfig:
            MonitoringOutputs:
              - S3Output:
                  LocalPath: "/opt/ml/processing/localpath"
                  S3Uri: !Sub s3://sagemaker-${AWS::Region}-${AWS::AccountId}/${ModelName}/monitoring/reports
          MonitoringResources:
            ClusterConfig:
              InstanceCount: 1
              InstanceType: ml.m5.xlarge
              VolumeKmsKeyId: !Ref KmsKeyId
              VolumeSizeInGB: 30
          RoleArn: !Ref MLOpsRoleArn
          StoppingCondition:
            MaxRuntimeInSeconds: 1800
        ScheduleConfig:
          ScheduleExpression: "cron(0 * ? * * *)"
      MonitoringScheduleName: !Sub mlops-${ModelName}-pms-${TrainJobId}

Require defining a Region mapping for the model analyzer:

  ModelAnalyzerMap:
    "us-west-2":
      "ImageUri": "159807026194.dkr.ecr.us-west-2.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
    "us-east-2":
      "ImageUri": "680080141114.dkr.ecr.us-east-2.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
    "us-east-1":
      "ImageUri": "156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
    "eu-west-1":
      "ImageUri": "890145073186.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
    "ap-northeast-1":
      "ImageUri": "574779866223.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
    "ap-northeast-2":
      "ImageUri": "709848358524.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
    "ap-southeast-2":
      "ImageUri": "563025443158.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
    "eu-central-1":
      "ImageUri": "048819808253.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-model-monitor-analyzer:latest"
brightsparc commented 3 years ago

This has been merged into master. Closing issue.