aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.73k stars 3.94k forks source link

apigateway: caching remains enabled after removal, potentially returning wrong data #32342

Closed apparentorder closed 1 day ago

apparentorder commented 1 week ago

Describe the bug

Removing the cache_enabled and cache_cluster_enabled settings from stage deployment options does not disable caching, as is the default behavior, despite cdk diff output suggesting it would be removed.

Side note: When trying to disabe caching entirely, the caching keys configuration might be removed at the same time. In this case, due to this bug, caching remains enabled but the caching keys are not respected anymore, which leads to caching/delivery of randomly wrong data to clients, causing havoc and mayhem.

Expected Behavior

caching is disabled and the cache cluster gets removed

Current Behavior

caching remains enabled and the cache cluster is still there, generating costs

Reproduction Steps

CDK script:

from aws_cdk import (
    Duration,
    Stack,
    aws_apigateway,
)
from constructs import Construct
import os

class ApiGwCacheStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        if os.environ.get("CACHE") == "yes":
            api = aws_apigateway.RestApi(self, "cdk-bug-api",
                rest_api_name="cdk-bug-api",
                description="This is my sample API for demonstration",
                deploy_options = {
                    "caching_enabled": True,
                    "cache_cluster_enabled": True,
                    "cache_ttl": Duration.minutes(60),
                    "throttling_burst_limit": 5,
                    "throttling_rate_limit": 10,

                }
            )
        else:
            api = aws_apigateway.RestApi(self, "cdk-bug-api",
                rest_api_name="cdk-bug-api",
                description="This is my sample API for demonstration",
                deploy_options = {
                    "throttling_burst_limit": 5,
                    "throttling_rate_limit": 10,

                }
            )

        api.root.add_resource("hello").add_method("GET") # dummy
        CfnOutput(self, "ApiGatewayId", value=api.rest_api_id)

Deploy with caching enabled:

$ export CACHE="yes"; cdk diff && cdk deploy --progress events
[...] Outputs:
ApiGwCacheStack.ApiGatewayId = 4hllumf8bk

Verify caching is enabled (as expected):

$ aws apigateway get-stage --rest-api-id 4hllumf8bk --stage-name prod --query 'methodSettings.*.cachingEnabled' --output text
True

Remove caching config from CDK definition:

$ export CACHE="no"; cdk diff && cdk deploy --progress events
[...]
[~] AWS::ApiGateway::Stage cdk-bug-api/DeploymentStage.prod cdkbugapiDeploymentStageprod3E594040
 ├─ [-] CacheClusterEnabled
 │   └─ true
 ├─ [-] CacheClusterSize
 │   └─ 0.5
 └─ [~] MethodSettings
     └─ @@ -1,7 +1,5 @@
        [ ] [
        [ ]   {
        [-]     "CacheTtlInSeconds": 3600,
        [-]     "CachingEnabled": true,
        [ ]     "DataTraceEnabled": false,
        [ ]     "HttpMethod": "*",
[...]
ApiGwCacheStack | 1/3 | 2:19:40 PM | UPDATE_COMPLETE      | AWS::ApiGateway::Stage      | cdk-bug-api/DeploymentStage.prod (cdkbugapiDeploymentStageprod3E594040) 

Verify that caching still enabled (bad):

$ aws apigateway get-stage --rest-api-id 4hllumf8bk --stage-name prod --query '[cacheClusterEnabled, methodSettings.*.cachingEnabled]' --output text
True
True

Additional Information/Context

This behavior may be affected by the other deployment options. For example, I was not able to reproduce the bug for caching_enabled until I have added cache_ttl to the options. But for cache_cluster_enabled, it was reproducible even with empty deployment options.

The bug doesn't seem to be related to the project language: This test case is Python, but the bug originally hit in production with Typescript.

Workaround: Explicitly setting those values to False works as expected.

CDK CLI Version

2.171.1 (build a95560c)

Node.js Version

Node.js v20.11.0

OS

AL2023

Language

Python

Language Version

3.9.16

ashishdhingra commented 1 week ago

@apparentorder Good morning. Thanks for reporting the issue. I noticed the below Note at Cache settings for REST APIs in API Gateway:

Note
Creating or deleting a cache takes about 4 minutes for API Gateway to complete.

When a cache is created, the Cache cluster value changes from Create in progress to Active. When cache deletion is completed, the Cache cluster value changes from Delete in progress to Inactive.

When you turn on method-level caching for all methods on your stage, the Default method-level caching value changes to Active. If you turn off method-level caching for all methods on your stage, the Default method-level caching value changes to Inactive. If you have an existing setting for a method-level cache, changing the status of the cache doesn't affect that setting.

Notice the below note:

Could you wait for 4 minutes (probably more) to verify if caching still enabled via AWS CLI after CDK deployment? Also, check if you have an existing setting for a method-level cache.

I tried to reproduce the issue using the below CDK code (in TypeScript), first enabling cache and then disabling it with re-deployment for stack:

import * as cdk from 'aws-cdk-lib';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';

export class CdktestStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const api = new apigateway.RestApi(this, 'cdk-bug-api', {
      restApiName: "cdk-bug-api",
      description: "This is my sample API for demonstration",
      deployOptions: {
        //cachingEnabled: true,
        //cacheClusterEnabled: true,
        //cacheTtl: cdk.Duration.minutes(60),
        throttlingBurstLimit: 5,
        throttlingRateLimit: 10
      }
    });

    api.root.addResource("hello").addMethod("GET");
    new cdk.CfnOutput(this, "ApiGatewayId", {
      value: api.restApiId
    });
  }
}

Below are some observations:

From CDK perspective, it generated the correct CloudFormation template and submitted the ChangeSet to CloudFormation. From there on, CDK is out of picture. The changes are deployed by CloudFormation to AWS API Gateway service. This appears to be a CloudFormation limitation; please open a new issue at https://github.com/aws-cloudformation/cloudformation-coverage-roadmap mentioning all the details.

Thanks, Ashish

github-actions[bot] commented 6 days ago

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.