aws / chalice

Python Serverless Microframework for AWS
Apache License 2.0
10.61k stars 1.01k forks source link

Using AWS CDK & Chalice construct received duplicated lambdas in synthed sam template #2030

Open Wtevia opened 1 year ago

Wtevia commented 1 year ago

Hi, on project we had single Chalice project for the whole API + pure lambdas (cloudwatch events, triggers etc) and as the project had grown, lambdas generated by project became bigger and bigger. We decided to split it into several chalice project that will be combined to single API under Custom domain name. For easy management&deploy we decided to put them in CDK project. As a result I created next stack that contains declaration of all chalice constructs (component_with_stack_declaration.py):

class RestApi(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)
        ...
        code_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "src")

        app1 = Chalice(self, "App1", source_dir=os.path.join(code_dir, "app1"))
        app2= Chalice(self, "App2", source_dir=os.path.join(code_dir, "app2"))
        ...

When I ran cdk synth I got next traceback:

Creating deployment package.
Creating deployment package.
jsii.errors.JavaScriptError: 
  @jsii/kernel.RuntimeError: Error: section 'Resources' already contains 'Lambda1LogicalID'
      at Kernel._ensureSync (C:\....\AppData\Local\Temp\tmpgg97w2wj\lib\program.js:10364:27)
      at Kernel.invoke (C:\....\AppData\Local\Temp\tmpgg97w2wj\lib\program.js:9764:34)
      at KernelHost.processRequest (C:\....\AppData\Local\Temp\tmpgg97w2wj\lib\program.js:11539:36)
      at KernelHost.run (C:\Users\....\Local\Temp\tmpgg97w2wj\lib\program.js:11499:22)
      at Immediate._onImmediate (C:\....\AppData\Local\Temp\tmpgg97w2wj\lib\program.js:11500:46)
      at processImmediate (node:internal/timers:466:21)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "B:\...\cdk_app\app.py", line 15, in <module>
    app.synth()
  File "B:\....\cdk_app\venv\lib\site-packages\aws_cdk\__init__.py", line 20043, in synth
    return typing.cast(_CloudAssembly_c693643e, jsii.invoke(self, "synth", [options]))
  File "B:\....\cdk_app\venv\lib\site-packages\jsii\_kernel\__init__.py", line 149, in wrapped
    return _recursize_dereference(kernel, fn(kernel, *args, **kwargs))
  File "B:\....\cdk_app\venv\lib\site-packages\jsii\_kernel\__init__.py", line 399, in invoke
    response = self.provider.invoke(
  File "B:\....\cdk_app\venv\lib\site-packages\jsii\_kernel\providers\process.py", line 377, in invoke
    return self._process.send(request, InvokeResponse)
  File "B:\....\cdk_app\venv\lib\site-packages\jsii\_kernel\providers\process.py", line 339, in send
    raise RuntimeError(resp.error) from JavaScriptError(resp.stack)
RuntimeError: Error: section 'Resources' already contains 'Lambda1LogicalID'

When I looked in the generated in cdk.out and chalice.out templates i saw that this lambda is actually declared twice. Example from .sam_with_assets.json files in generated chalice.out folder:

Chalice app1
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Transform": "AWS::Serverless-2016-10-31",
  "Outputs": {},
  "Resources": {
    "Lambda1LogicalID": {
      "Type": "AWS::Serverless::Function",
      "Properties": {
        "Runtime": "python3.9",
        "Handler": "chalicelib.app1.handler.lambda1",
        "CodeUri": {
          "Bucket": "cdk-hnb659fds-assets-XXXXXXXXXXXX-us-east-1",
          "Key": "963f64d496d61b07c0f89b0c476f7ff9a0ebdc17e4f8b80484a4f957eecca97e.zip"
        },
        "Tags": {
          "aws-chalice": "version=1.27.3:stage=dev/RestApi:app=app1"
        },
        ...

Chalice app2
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Transform": "AWS::Serverless-2016-10-31",
  "Outputs": {},
  "Resources": {
    "Lambda1LogicalID": {
      "Type": "AWS::Serverless::Function",
      "Properties": {
        "Runtime": "python3.9",
        "Handler": "chalicelib.app1.handler.lambda1",
        "CodeUri": {
          "Bucket": "cdk-hnb659fds-assets-XXXXXXXXXXXX-us-east-1",
          "Key": "972986e72385367de3787cc5004baa8efe91c8c98335a070ee636b9f7519afd7.zip"
        },
        "Tags": {
          "aws-chalice": "version=1.27.3:stage=dev/RestApi:app=app2"
        },
        ...

As you can see they are identical apart aws-chalice tag and key in bucket. I checked several times within generated .zip assets, they contained correct 2 different chalice project code, but for some reason template of second chalice project (app2) contains all the same lambdas as in the first project (app1) and doesnt contain any lambdas that are declared in code of the .zip archive that is created for it. I tried to place Chalice constructs in separate inner stacks of RestApi stack, but got the same result - differect assets with code but same template with same lambdas' resources (apart tags). Code is stored in the next order:

cdk_app
|  - app.py
|  - restapi_stack
      |  - component_with_stack_declaration.py
      |  - src
           | - app1
           |      | - .chalice
           |           chalicelib
           |           app.py
           | - app2
                  | - .chalice
                       chalicelib
                       app.py

Maybe someone had similar issue? I would be happy to receive your feedback

Wtevia commented 1 year ago

I discovered that the actual issue was that i wanted to keep the same chalice stage name for all Chalice constructs, so i set as scope for them the same stack. It seems chalice construct requires to be single per stack. I tried to do so because now in chalice construct you cant set what chalice stage name from config file to use. It would be very nice to have such possibility, because now you have to set stage name based on stack full name, what is not very friedly if you migrate the existing app to cdk. For ex. in my case i have next path dev/RestApi/App1 as chalice stage name.

Wtevia commented 1 year ago

It would be very nice to add this info (single chalice construct per stack) to documentation of cdk or fix this if this isn't planned behavior

Matvii-Boichenko commented 1 year ago

I ran some tests and discovered strage behavior, even though i succeeded to deploy different lambdas, it seems that CDK creates different APIs but with the same path&methods structure as in the first Chalice construct it happens when I deploy all stack at once (--all flag). Strange is that it maps these identical APIGateway (different ids but same path&methods definition) to different lambdas. It seems for API endpoints (for APIGateway) definition in all chalice projects is used always the first chalice project. Is that correct behaviour for chalice construct?

Wtevia commented 1 year ago

I map a bit through the code of chalice and finally this time defilitely I find out where the problem is. Chalice builds template based on code in imported modules, but importing in python works with such logic that if you already imported some module you shoud reimport it, just import wont work for update. So what actually happens is that the first time chalice correctly import project modules it loads app.py module and chalicelib package with inner modules. And then when it checkout to different directory with same structure (and chalice requires the same structure with app.py and chalicelib) it tries to import again app.py but as we already have it in imported it doesnt update it. The block of code where issue happens is in chalice/cli/factory.py :

class CLIFactory(object):
   ...
      def load_chalice_app(self, environment_variables=None,
                         validate_feature_flags=True):
        ...
        try:
            app = importlib.import_module('app')

            chalice_app = getattr(app, 'app')

            ## BEGIN Added code
            module_names = list(sys.modules.keys()).copy()
            for module_name in module_names:
                if "app" == module_name or "chalicelib" in module_name:
                    del sys.modules[module_name]
            ## END

        except SyntaxError as e:
            ...
        return chalice_app
   ...

This actualy can be solved very easily deleting app and chalicelib from imported sys.modules, but this works well in my case because i don't have modules/packages with same name in 2 chalice projects (appart app and chalicelib). So i found the way to fix it in more generic way in 4 lines to call after calling Chalice cdk construct declaration:

module_names = list(sys.modules.keys()).copy()
for module_name in module_names:
  if "app" == module_name or "chalicelib" in module_name:
      del sys.modules[module_name]

It would be nice to see some kind of such update in Chalice itself. Maybe even a bit more complicate variation with tracking of all imported packages during calling importlib.import_module('app') including third party libraries with next deletion of them from system imported modules after building app package, so that we wont have collision bettween different apps.

lymmurrain commented 1 year ago

Chalice causes all Chalice projects in stages after the first stage to import the last Chalice project in the first stage. This is because Chalice's load_chalice_app function adds the path to the Chalice project at the beginning of the sys path. Even if the app package is removed from sys.modules, it still imports the wrong app package. The problematic code exists in chalice/cli/factory.py.

def load_chalice_app(
        self,
        environment_variables: Optional[MutableMapping] = None,
        validate_feature_flags: Optional[bool] = True,
) -> Chalice:
    # validate_features indicates that we should validate that
    # any expiremental features used have the appropriate feature flags.
    if self.project_dir not in sys.path:
        sys.path.insert(0, self.project_dir)
    ...

Additionally, there are other packages with the same name in my Chalice projects, so simply removing app and chalicelib is not enough. My solution is as follows (modifying the load_chalice_app function):

def load_chalice_app(
        self,
        environment_variables: Optional[MutableMapping] = None,
        validate_feature_flags: Optional[bool] = True,
) -> Chalice:
    # validate_features indicates that we should validate that
    # any expiremental features used have the appropriate feature flags.
    if self.project_dir not in sys.path:
        sys.path.insert(0, self.project_dir)
    # The vendor directory has its contents copied up to the top level of
    # the deployment package. This means that imports will work in the
    # lambda function as if the vendor directory is on the python path.
    # For loading the config locally we must add the vendor directory to
    # the path so it will be treated the same as if it were running on
    # lambda.
    vendor_dir = os.path.join(self.project_dir, 'vendor')
    if os.path.isdir(vendor_dir) and vendor_dir not in sys.path:
        # This is a tradeoff we have to make for local use.
        # The common use case of vendor/ is to include
        # extension modules built for AWS Lambda.  If you're
        # running on a non-linux dev machine, then attempting
        # to import these files will raise exceptions.  As
        # a workaround, the vendor is added to the end of
        # sys.path so it's after `./lib/site-packages`.
        # This gives you a change to install the correct
        # version locally and still keep the lambda
        # specific one in vendor/
        sys.path.append(vendor_dir)
    if environment_variables is not None:
        self._environ.update(environment_variables)
    try:
        ## BEGIN Added code
        PRELOADED_MODULES = set()

        def init():
            # local imports to keep things neat
            from sys import modules
            import importlib

            nonlocal PRELOADED_MODULES

            # sys and importlib are ignored here too
            PRELOADED_MODULES = set(modules.values())

        def delete_imported():
            from sys import modules
            import importlib
            import gc
            deleted = set()
            for module in set(modules.values()) - PRELOADED_MODULES:
                try:
                    del sys.modules[module.__name__]
                    deleted.add(module.__name__)
                except:
                    # there are some problems that are swept under the rug here
                    pass
            gc.collect()
            print("Deleted modules: ", deleted)
            if self.project_dir  in sys.path:
                sys.path.remove(self.project_dir)

        init()
        ## END
        print(f'app in sys.modules: {sys.modules.get("app")}')
        app = importlib.import_module('app')
        chalice_app = getattr(app, 'app')
        ## BEGIN Added code
        delete_imported()
        ## END

By the way, if the Chalice project uses the Pydantic module, you should skip packages that use Pydantic when deleting imported modules in delete_imported. Otherwise, it may report errors such as duplicate validator functions.

Wtevia commented 1 year ago

@lymmurrain, you are right, as i said in my comment what i did was a very quick fix that worked in my case. Your solution is definitely more correct, but as a developer that uses the library i would apply it in a bit different in a form of custom construct or decorator that tracks all new imports in Chalice construct. Modifying library directly creates some problems to share it with whole team (create another lib), then you need to track and update it as new updates of original lib appears, load it in your lambda layers etc. However your example can be copied to library by contributors and i would be happy to see this ASAP