DataDog / datadog-serverless-functions

Repo of AWS Lambda and Azure Functions functions that process streams and send data to Datadog
Apache License 2.0
337 stars 386 forks source link

Upgrading to 3.77.0 & Python 3.9 Results in Import Error #684

Open kgochenour opened 1 year ago

kgochenour commented 1 year ago

Describe what happened: We use the DD provided cloudformation stack to deploy the Lambda Log forwarder. However it was on version 3.59.0. We upgraded to 3.77.0 using the latest.yaml template. The update went through, but suddenly the lambda failed all invocations with the following:

[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': cannot import name '_rand' from 'ddtrace.internal' (/var/task/ddtrace/internal/__init__.py)
Traceback (most recent call last):

Manually changing the run time from Python 3.9 to 3.8 fixed this issue.

As a result we rolled back to 3.73.0 which is the last Python 3.8 deployment, and everything is working OK.

Describe what you expected: Expected to run latest.yaml Cloudformation template and the Log Forwarder to work.

Steps to reproduce the issue: Deployed latest.yaml as an update to our CloudFormation stack.

RaphaelAllier commented 1 year ago

Hello,

I've opened a ticket to track this internally, in the meantime 3.73 should be stable

RaphaelAllier commented 1 year ago

Hello @kgochenour, when updating did the Layer version get updated as well ? I redeployed the latest version from scratch using the CF template in our sandbox and didn't encounter this import issue. EDIT: Installation from scratch worked but updating through CF with the latest.yaml yields your issue, we're looking into it

kgochenour commented 1 year ago

@RaphaelAllier I opened a ticket with DD (internal ticket 1289207) on this, and we found that the CloudFormation template was not actually updating the Lambda code. The stack would change the Tag for the forwarder version, and did update the runtime. However in the logs, and in DD itself, the lambda reported as being on an old version.

I found that by running the cloudformation template update, then manually updating the lambda code with the datadog forwarder code, that it would work. But this meant that all future updates would need to be done this way. Which is odd.

My internal ticket closed because I did not respond due to being away. But basically we werent able to figure out why this happened.

RaphaelAllier commented 1 year ago

@kgochenour Hello. Following up as I was on PTO, we identified internally that there was indeed an issue with the Cloudformation update path. We're trying to pinpoint the issue and will post updates when we have a fix.

The issue seems to be located in the CF update path. Newer installations work fine with InstallAsLayer set to true as the update only changes the layer (which contains the whole Forwarder code). However, older installations may see a conflict of dependencies that we're trying to investigate. Thanks for the ticket number, I'll also have a look there

RaphaelAllier commented 1 year ago

After some troubleshooting I think we've narrowed down the issue here

To recap:

We tested various installations/setup and Cloudformation upgrade patterns.

When the Forwarder is deployed with its code packaged in the layer, Cloudformation updates will only make a call to UpdateFunctionConfiguration, which won't update the source code of the Lambda. If the function code has been manually updated, CF won't take that into account when the stack is re-applied, which may lead to errors.

Additionally, a workaround to sanitise the installation is to reapply the CF template twice, once with InstallAsLayer: false and the second time with InstallAsLayer: true, which should clean up the Lambda's source code.

Alternatively, fresh installation (not Cloudformation upgrades) of the latest version of the template are functional

anthonyangel commented 1 year ago

Hi @RaphaelAllier Do you have an upgrade guide? From reading the docs it looks like InstallAsLayer defaults to true, but my reading of the post above suggests that it should be possible to upgrade with the following steps:

  1. deploy 3.73, InstallAsLayer:true (ie default)
  2. deploy 3.73, InstallAsLayer:false
  3. deploy 3.73, InstallAsLayer:true
  4. deploy 3.91, InstallAsLayer:true

Please can you confirm, or provide alternate instructions?

RaphaelAllier commented 1 year ago

Hello @anthonyangel If your installation is currently using the version 3.73.0 and may have been manually upgraded in the past, steps to "sanitise" the installation (if needed, regular CF upgrades should work fine by default) should be:

Additional steps to make sure the forwarder code is completely packaged in the layer only, avoiding dependency conflicts

template URL: https://datadog-cloudformation-template.s3.amazonaws.com/aws/forwarder/3.73.0.yaml

From this step, your forwarder shouldn't have any code except in the Lambda layer attached to it (which is the standard way of deploying the Datadog Forwarder)

Regular upgrade process

template URL: https://datadog-cloudformation-template.s3.amazonaws.com/aws/forwarder/latest.yaml

To clarify, regular CF upgrades should always work fine. The errors listed in this issue we think resulted from a mix of CloudFormation upgrades and manual upgrades (like getting the last zip code from the latest release and manually installing it in the AWS console)