Open mostafafarzaneh opened 2 years ago
I also tried to create a metric this way:
custom_metric = cloudwatch.MathExpression( expression='SELECT AVG(ActiveConnections) FROM "myMetrics/custom"', period=Duration.minutes(1), )
and use it in StepScalingPolicy
. CDK complainse:
Alarm contains invalid expressions. (Service: AmazonCloudWatch; Status Code: 400; Error Code: ValidationError; Request ID: 3c245f6f-9d5e-492e-b2e1-e0fa83422594; Proxy: null)
These properties are directly passed to the ScalingPolicy CloudFormation resource in this property.
Our Metric
class supports these properties, while our MathExpression
class doesn't. I think we would need additional functionality from cloudformation for this to be implemented
Also encountering this issue
Our
Metric
class supports these properties, while ourMathExpression
class doesn't. I think we would need additional functionality from cloudformation for this to be implemented
It seems like CloudFormation's AWS::AutoScaling::ScalingPolicy is indeed lacking some configuration parameters. The CustomizedMetricSpecification
type from the AutoScaling API has a member Metrics
where expressions can be specified like seen in the docs. On the other hand AWS::AutoScaling::ScalingPolicy CustomizedMetricSpecification does NOT have the Metrics
property.
It looks like AWS announced support for this recently: https://www.amazonaws.cn/en/new/2023/application-auto-scaling-supports-metric-math-for-target-tracking-policies/
and the documentation is now available: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking-metric-math.html
It would be great to see this feature in CDK too
It looks like AWS announced support for this recently: https://www.amazonaws.cn/en/new/2023/application-auto-scaling-supports-metric-math-for-target-tracking-policies/
and the documentation is now available: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking-metric-math.html
It would be great to see this feature in CDK too
As per the second page you link to, "This feature is not yet available in AWS CloudFormation.". So CDK either has to wait for Cloudformation support, or provide this via a custom resource.
Any updates on this? Or any workarounds via CDK as of now?
@zubairzahoor
I came across this today whilst folloing the and lost a few hours on it. As a temporary workaround, I've added a custom resource using AwsCustomResource
. Not ideal but could be an option for you in the mean time.
import { AwsCustomResource, AwsCustomResourcePolicy, PhysicalResourceId } from 'aws-cdk-lib/custom-resources';
import { Construct } from 'constructs';
import { Metric } from 'aws-cdk-lib/aws-cloudwatch';
import { IQueue } from 'aws-cdk-lib/aws-sqs';
import { Effect, PolicyStatement } from 'aws-cdk-lib/aws-iam';
import { buildVersion } from '../../../utils/build-version';
interface EcsSqsMathExpressionAutoScalingPolicyProps {
targetValue: number;
resourceId: string; // format service/{cluster_name}/{service_name}
queue: IQueue;
taskCountMetric: Metric;
}
export class EcsSqsMathExpressionAutoScalingPolicy extends Construct {
constructor(scope: Construct, id: string, props: EcsSqsMathExpressionAutoScalingPolicyProps) {
super(scope, id);
new AwsCustomResource(this, 'scaling-put-autoscaling-policy', {
onUpdate: {
physicalResourceId: PhysicalResourceId.of(`sqs-backlog-scaling-policy/${props.resourceId}`),
service: 'ApplicationAutoScaling',
action: 'putScalingPolicy',
parameters: {
PolicyName: `sqs-backlog-scaling-policy-${props.resourceId}-${buildVersion}`,
PolicyType: 'TargetTrackingScaling',
ResourceId: props.resourceId,
ScalableDimension: 'ecs:service:DesiredCount',
ServiceNamespace: 'ecs',
TargetTrackingScalingPolicyConfiguration: {
TargetValue: props.targetValue,
CustomizedMetricSpecification: {
Metrics: [
{
Id: 'm1',
Label: 'Appox. # of Messages Visible',
ReturnData: false,
MetricStat: {
Stat: 'Sum',
Metric: {
MetricName: props.queue.metricApproximateNumberOfMessagesVisible().metricName,
Namespace: props.queue.metricApproximateNumberOfMessagesVisible().namespace,
Dimensions: [
{
Name: 'QueueName',
Value: props.queue.queueName
}
]
}
}
},
{
Id: 'm2',
Label: 'Running Instances Count',
ReturnData: false,
MetricStat: {
Stat: 'Average',
Metric: {
MetricName: props.taskCountMetric.metricName,
Namespace: props.taskCountMetric.namespace,
Dimensions: Object.entries(props.taskCountMetric.dimensions || {}).map(([key, value]) => ({
Name: key,
Value: value
}))
}
}
},
{
Label: 'Backlog per Instance',
Id: 'e1',
Expression: 'm1 / m2',
ReturnData: true
}
]
}
}
}
},
policy: AwsCustomResourcePolicy.fromStatements([
new PolicyStatement({
effect: Effect.ALLOW,
actions: ['application-autoscaling:*', 'ecs:DescribeServices', 'ecs:UpdateService'],
resources: ['*']
})
])
});
}
}
You can use it like this:
this.scaling = this.fargateService.autoScaleTaskCount({
minCapacity: 0,
maxCapacity: 100
});
const customScalingPolicy = new EcsSqsMathExpressionAutoScalingPolicy(this, 'scaling-policy', {
targetValue: props.acceptableLatency.toSeconds() / props.averageMessageProcessingTime.toSeconds(),
resourceId: `service/${props.cluster.clusterName}/${this.fargateService.serviceName}`,
queue: queue,
taskCountMetric: desiredCountMetric
});
customScalingPolicy.node.addDependency(this.scaling);
It may need some adaptations to meet your needs but it should give you a good starting point.
I should mention I've not fully tested this yet so if you notice anything weird then please share :)
@alexbaileyuk Thank you! Tried this for my use-case (with AmazonMq/ECS) and seems to work. What are the minimum permissions needed for execution role of the lambda here?
@zubairzahoor due to difficulties with this method I ended up writing a totally different function which pre-calculates backlog / instance by pulling and calculating. Something like this:
import { DescribeServicesCommand, ECSClient, paginateListServices } from '@aws-sdk/client-ecs';
import { CloudWatchClient, MetricDatum, PutMetricDataCommand } from '@aws-sdk/client-cloudwatch';
import { SQSClient, GetQueueAttributesCommand } from '@aws-sdk/client-sqs';
const ecsClient = new ECSClient({
region: 'eu-west-1'
});
const cloudwatchClient = new CloudWatchClient({
region: 'eu-west-1'
});
const sqsClient = new SQSClient({
region: 'eu-west-1'
});
export const putInstanceBacklogMetrics = async (clusterName: string) => {
const consumerServices = await listServices(clusterName);
const backlogMetrics = await Promise.all(consumerServices.map((serviceArn) => calculateBacklogForConsumerService(clusterName, serviceArn)));
const metrics = backlogMetrics.map((backlog) => {
console.log(`Service ${backlog.serviceName} has desired count ${backlog.desiredCount} and queue length ${backlog.queueLength}`);
let instanceBacklog = null;
if (backlog.desiredCount === 0 && backlog.queueLength > 0) {
// If there are no instances running we have to pretend the backlog is the acceptable backlog per instance + 1
// so that we scale up to one instance. This allows us to scale down to zero instances when there is no backlog.
// This will cause some jitter in the instance backlog metric, but it allows us to scale to zero. In test environments
// it'll be fine, in production we'll have enough traffic that the jitter will be negligible and instances will usually be
// scaled up to at least one.
instanceBacklog = backlog.queueLength > backlog.acceptableBacklogPerInstance ? backlog.queueLength : backlog.acceptableBacklogPerInstance + 1;
} else if (backlog.queueLength === 0) {
instanceBacklog = 0;
} else if (backlog.desiredCount > 0) {
instanceBacklog = backlog.queueLength / backlog.desiredCount;
} else {
instanceBacklog = 0;
}
return {
MetricName: 'ConsumerInstanceBacklog',
Dimensions: [
{
Name: 'ClusterName',
Value: clusterName
},
{
Name: 'ServiceName',
Value: backlog.serviceName
}
],
Value: instanceBacklog
};
});
if (metrics.length === 0) {
console.log('No consumer services found');
return;
}
await putConsumerInstanceBacklogMetric(metrics);
};
const listServices = async (clusterName: string) => {
const paginator = paginateListServices({ client: ecsClient }, { cluster: clusterName });
const serviceArns: string[] = [];
for await (const page of paginator) {
for (const serviceArn of page.serviceArns ?? []) {
if (await isConsumerService(clusterName, serviceArn)) {
serviceArns.push(serviceArn);
}
}
}
console.log(`Found ${serviceArns.length} consumer services in cluster ${clusterName}`);
return serviceArns;
};
const isConsumerService = async (clusterName: string, serviceArn: string) => {
const serviceDetails = await ecsClient.send(
new DescribeServicesCommand({
cluster: clusterName,
services: [serviceArn],
include: ['TAGS']
})
);
return (
serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'QueueUrl') !== undefined &&
serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'AcceptableBacklogPerInstance') !== undefined
);
};
const calculateBacklogForConsumerService = async (clusterName: string, serviceArn: string) => {
const serviceDetails = await ecsClient.send(
new DescribeServicesCommand({
cluster: clusterName,
services: [serviceArn],
include: ['TAGS']
})
);
const desiredCount = serviceDetails.services?.[0].desiredCount || 0;
const queueName = serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'QueueUrl')?.value || '';
const queueLength = await getQueueLength(queueName);
const acceptableBacklogPerInstance = parseInt(
serviceDetails.services?.[0].tags?.find((tag) => tag.key === 'AcceptableBacklogPerInstance')?.value || '0'
);
return {
serviceName: serviceDetails.services?.[0].serviceName || 'UNKNOWN',
desiredCount: desiredCount,
queueLength: queueLength,
acceptableBacklogPerInstance: acceptableBacklogPerInstance
};
};
const getQueueLength = async (queueUrl: string) => {
const queueDetails = await sqsClient.send(
new GetQueueAttributesCommand({
QueueUrl: queueUrl,
AttributeNames: ['ApproximateNumberOfMessages']
})
);
return parseInt(queueDetails.Attributes?.ApproximateNumberOfMessages || '0');
};
const putConsumerInstanceBacklogMetric = async (metrics: MetricDatum[]) => {
await cloudwatchClient.send(
new PutMetricDataCommand({
Namespace: 'ECS/CustomServiceMetrics',
MetricData: metrics
})
);
};
It also relies on some tags on the ECS services. It's a bit messy and not well refined at the moment since I'm still testing and working on edge cases like the scale to zero ones. It loops through all services in a cluster and based on their tags and metrics, defines a new metric called ConsumerInstanceBacklog
to do target tracking against.
I'd advise doing something similar. The main issues came on stack updates. You can't create the scaling policy without defining a name and when defining a name I ended up with tons of issues trying to update/replace/rollback etc. I'd recommend not using the above method for those reasons.
@alexbaileyuk I am more comfortable using the above, works well for me. Were there any issues you encounted with scaling-in using the custom resource?
@zubairzahoor we're going to production later in the week with a more refined version of the code. We've not found any major issues so far.
Very interesting discussion. I managed to fix it with CDK-only syntax. I hope it helps 😉
const resourceId = `endpoint/${this.endpointName}/variant/${variant.name}`;
// To define min/max values
const target = new ScalableTarget(this, 'ScalableTarget', {
serviceNamespace: ServiceNamespace.SAGEMAKER,
minCapacity: variant.autoScale.minCapacity,
maxCapacity: variant.autoScale.maxCapacity,
scalableDimension: 'sagemaker:variant:DesiredInstanceCount',
resourceId,
});
// We need the endpoint before creating the autoscaling policy
target.node.addDependency(endpoint);
const scalingPolicy = new CfnScalingPolicy(this, 'ScalingPolicy', {
policyName: resourceId,
scalingTargetId: target.scalableTargetId,
policyType: 'TargetTrackingScaling',
targetTrackingScalingPolicyConfiguration: {
targetValue: variant.autoScale.targetProcessingTime,
},
});
// CDK doesn't support math expression in target tracking, adding it in cloudformation manually
scalingPolicy.addPropertyOverride(
'TargetTrackingScalingPolicyConfiguration.CustomizedMetricSpecification',
{
Metrics: [
{
Id: 'm1',
ReturnData: false,
MetricStat: {
Stat: 'Average',
Metric: {
MetricName: 'TotalProcessingTime',
Namespace: 'AWS/SageMaker',
Dimensions: [
{
Name: 'EndpointName',
Value: this.endpointName,
},
{
Name: 'VariantName',
Value: variant.name,
},
],
},
},
},
{
Id: 'm2',
ReturnData: true,
Expression: 'FILL(m1, 0)',
},
],
}
);
Given that CloudFormation has officially supported Target Tracking Scaling on Metric Math Link, we can use L1 Construct now. This solution may require a higher aws-cdk-lib version. For example, to scale ECS Service with application_autoscaling:
// Register the ECS Fargate Service as a scalable target for Application AutoScaling
const serviceScalableTarget = new aws_applicationautoscaling.ScalableTarget(this,
"serviceScalableTarget",
{
serviceNamespace: aws_applicationautoscaling.ServiceNamespace.ECS,
scalableDimension: "ecs:service:DesiredCount",
resourceId: `service/${clusterName}/${serviceName}`,
minCapacity: ecsMinCapacity,
maxCapacity: ecsMaxCapacity,
}
)
// Documentation: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-autoscaling-scalingpolicy-targettrackingmetricdataquery.html
const mathExpressionSpecification: CfnScalingPolicy.CustomizedMetricSpecificationProperty = {
metrics: [
{
expression: "approximateNumberOfMessagesVisible / desiredTaskCount",
id: "sqsBacklogPerECSTask",
label: "SQSBacklogPerECSTask",
returnData: true,
},
{
id: "desiredTaskCount",
label: "DesiredTaskCount",
metricStat: {
metric: {
namespace: CONTAINER_INSIGHTS_NAMESPACE,
metricName: "DesiredTaskCount",
dimensions: [{
name: "ClusterName",
value: clusterName
}, {
name: "ServiceName",
value: serviceName
}],
},
stat: "Average",
},
returnData: false,
},
{
id: "approximateNumberOfMessagesVisible",
label: "ApproximateNumberOfMessagesVisible",
metricStat: {
metric: {
namespace: SQS_NAMESPACE,
metricName: "ApproximateNumberOfMessagesVisible",
dimensions: [{
name: "QueueName",
value: sqsQueueName
}],
},
stat: "Average",
},
returnData: false,
}
],
};
// Documentation: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_applicationautoscaling.CfnScalingPolicy.TargetTrackingScalingPolicyConfigurationProperty.html
const serviceScalingPolicy = new aws_applicationautoscaling.CfnScalingPolicy(this,
"serviceScalingPolicy",
{
policyName: "serviceScalingPolicy",
policyType: "TargetTrackingScaling",
scalingTargetId: serviceScalableTarget.scalableTargetId,
targetTrackingScalingPolicyConfiguration: {
targetValue: targetValueForSQSBacklogPerECSTask,
scaleInCooldown: scaleInCooldownForTargetTrackingScaling,
scaleOutCooldown: scaleOutCooldownForTargetTrackingScaling,
customizedMetricSpecification: mathExpressionSpecification,
}
}
)
This solution is equivalent to Create a target tracking scaling policy for Application Auto Scaling using metric math
Describe the bug
I would like to use a
MathExpression
for custom metric inTargetTrackingScalingPolicy
, but I got this error:checking the code here, it only checks for
metricStat
notmathExpression
.Expected Behavior
Should allow to define math expression for Target Tracking.
Current Behavior
Only direct metrics are allowed
Reproduction Steps
Create Target Tracking using math expression
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.27.0
Framework Version
No response
Node.js Version
16.15.0
OS
Debian 10
Language
Python
Language Version
No response
Other information
No response