A recent change in ECS RunTask Cloudtrail breaks the ability to parse AWS Batch job metadata (like job id, compute environment name, etc.) from environment variables in the ECS RunTask Cloudtrail event as the environment variables are now redacted.
Redaction of the environment variables breaks this state machine result in an error when the incoming Cloudtrail event is processed by the step function
Sample ECS RunTask event shows the redacted environment variables
This results in the following failure in the Select Common fields step in the ECS RunTask state machine impairing the ability to post the metrics used in the job placement dashboard
{
"cause": "An error occurred while executing the state 'Select common fields' (entered at the event id #4). The JSONPath '$.detail.requestParameters.overrides.containerOverrides[0].environment[?(@.name=='AWS_BATCH_JOB_ID')].value' specified for the field 'JobId.$' could not be found in the input '...'",
"error": "States.Runtime"
}
Proposed solution
Use ECS task state events as the source of truth for successfully placed Batch jobs (defined as ECS RunTask calls that have succeeded). Add a new state machine to process these events. Existing Lambda to publish the EMF metrics is reused.
Use Batch system created tags in the RunTask request to identify the compute environment and job queue associated with a RunTask request for which placement has failed (for example, container instances have not yet been scaled up). Batch system created tags use aws:batch prefix in the ECS RunTask request (see Cloudtrail event example above).
Problem
A recent change in ECS RunTask Cloudtrail breaks the ability to parse AWS Batch job metadata (like job id, compute environment name, etc.) from environment variables in the ECS RunTask Cloudtrail event as the environment variables are now redacted.
Specifically, the ECS RunTask state machine relies on parsing the following meta data from ECS RunTask Cloudtrail event
Redaction of the environment variables breaks this state machine result in an error when the incoming Cloudtrail event is processed by the step function
Sample ECS RunTask event shows the redacted environment variables
This results in the following failure in the Select Common fields step in the ECS RunTask state machine impairing the ability to post the metrics used in the job placement dashboard
Proposed solution
aws:batch
prefix in the ECS RunTask request (see Cloudtrail event example above).