aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

SageMaker Processing Job doesn't support FastFile Input Mode #4711

Open joonb14 opened 3 months ago

joonb14 commented 3 months ago

SageMaker Processing Job doesn't support FastFile Input Mode This might be the issue only for the step function, and python-sdk might provide the FastFile mode. However it doesn't make sense to me that one provides functionality and one doesn't. So, I post this issue.

According to this document, the ProcessingInput accepts FastFile mode. However If I try to create Processing Job using Step Function, error occurs.

"Error": "SageMaker.AmazonSageMakerException",
"Cause": "1 validation error detected: Value 'FastFile' at 'processingInputs.2.member.s3Input.s3InputMode' failed to satisfy constraint: Member must satisfy enum value set: [Pipe, File] (Service: AmazonSageMaker; Status Code: 400; Error Code: ValidationException; Request ID: 05f50214-59f0-4518-bd8e-36d800b078d0; Proxy: null)"

It only accepts Pipe, File. This is related to this closed issue #3962 And the document is updated by this PR#4311

Step Function Configuration image

{
  "ProcessingResources": {
    "ClusterConfig": {
      "InstanceCount": 1,
      "InstanceType.$": "$.inferenceOpt.instanceType",
      "VolumeSizeInGB": 100
    }
  },
  "AppSpecification": {
    "ImageUri.$": "$.inferenceOpt.imageUri",
    "ContainerEntrypoint": [
      "python3",
      "/opt/ml/code/inference.py"
    ]
  },
  "Environment": {
    "FINAL_KERNEL_SIZE.$": "$.hyperparameters.FINAL_KERNEL_SIZE",
    "MODEL_ARCH.$": "$.hyperparameters.MODEL_ARCH",
    "LAST_LEVEL.$": "$.hyperparameters.LAST_LEVEL",
    "IMG_SIZE.$": "$.hyperparameters.IMG_SIZE",
    "NUM_CLASSES.$": "$.hyperparameters.NUM_CLASSES",
    "CLASS_DICT.$": "$.hyperparameters.CLASS_DICT",
    "PREDICT_DATASET_ID.$": "$.hyperparameters.DATASET_ID",
    "S3_PREFIX.$": "$.hyperparameters.SERVICE_NAME",
    "PREDICT_ID.$": "$.hyperparameters.PREDICT_ID",
    "MANIFEST_FILE.$": "$.inferenceOpt.manifestFile",
    "QUEUE_URL": "https://sqs.ap-northeast-2.amazonaws.com/405240163678/Prod-Proto-CvOpsApplication-ProtoInferenceMetricQueueProtoInference-SgfJZK5ae1TV",
    "REGION": "ap-northeast-2"
  },
  "NetworkConfig": {
    "EnableNetworkIsolation": false,
    "EnableInterContainerTrafficEncryption": true,
    "VpcConfig": {
      "SecurityGroupIds": [
        "sg-0aee4467bff25504b"
      ],
      "Subnets": [
        "subnet-06dc8a4e813910a07",
        "subnet-0b023fbbb320cf8d8",
        "subnet-0e3b92aad033b1993"
      ]
    }
  },
  "ProcessingInputs": [
    {
      "InputName": "model",
      "S3Input": {
        "S3Uri.$": "$.inferenceOpt.modelArtifact",
        "LocalPath": "/opt/ml/processing/model",
        "S3DataType": "S3Prefix",
        "S3InputMode": "File",
        "S3DataDistributionType": "FullyReplicated",
        "S3CompressionType": "None"
      }
    },
    {
      "InputName": "dataset",
      "S3Input": {
        "S3Uri.$": "$.inferenceOpt.datastoreName",
        "LocalPath": "/opt/ml/processing/manifest",
        "S3DataType": "S3Prefix",
        "S3InputMode": "FastFile",
        "S3DataDistributionType": "FullyReplicated",
        "S3CompressionType": "None"
      }
    },
    {
      "InputName": "manifest",
      "S3Input": {
        "S3Uri.$": "$.inferenceOpt.manifestFile",
        "LocalPath": "/opt/ml/processing/config",
        "S3DataType": "S3Prefix",
        "S3InputMode": "File",
        "S3DataDistributionType": "FullyReplicated",
        "S3CompressionType": "None"
      }
    }
  ],
  "RoleArn": "arn:aws:iam::405240163678:role/Prod-Proto-CvOpsApplicati-ProtoInferenceWorkflowPro-Kh3pe5GGN20F",
  "ProcessingJobName.$": "States.Format('cvops-inference-{}', $$.Execution.Name)",
  "Tags": [
    {
      "Key": "CVOps",
      "Value": "Proto-CvopsInferenceTask"
    }
  ]
}

Expected behavior Accepting the FastFile mode, or updating the document

giamic commented 2 weeks ago

I can confirm that FastFile doesn't work in the python API either.