elastic / apm-agent-nodejs

https://www.elastic.co/guide/en/apm/agent/nodejs/current/index.html
BSD 2-Clause "Simplified" License
582 stars 224 forks source link

Lambda: Delay Metadata Fetching/Populating Until First Function Invocation #2404

Closed astorm closed 2 years ago

astorm commented 2 years ago

From the Lambda Spec

Some metadata fields are not available at startup (e.g. invokedFunctionArn which is needed for service.id and cloud.region). Therefore, retrieval of metadata fields in a lambda context needs to be delayed until the first execution of the lambda function, so that information provided in the context object can used to set metadata fields properly.

This metadata is both cloud metadata fields and standard metadata values. Both elastic/apm-agent-nodejs and elastic/apm-nodejs-http-client presume this metadata is set once during agent startup and never again. We'll need to take steps to ensure that this metadata isn't set until the first function invocation, and setup some sort of system/code to get data from the context argument of the lambda handler into the encoded metadata in the client.

trentm commented 2 years ago

tl;dr I intend to implement "Option 2" described below.

How metadata works in the current Agent and http Client

In a lambda environment there is special handling in the cloudMetadataFetcher that immediately returns a subset of the required Lambda metadata. Only a subset because the some fields are derived from data passed to the first Lambda function invocation.

Option 1: the simplest thing

The simplest addition I see is to add the following to the above "How metadata works":

Possible issues with this: In all expected usage of a Lambda function, no transactions/spans/metricsets will be sent to the Client before that apm.lambda() starts and can setExtraMetadata() first. However, it possible this assumption is broken: If the top-level code in the JS file with the function manually starts/ends a transaction. If eventually we have agent-created metrics in Lambda and the initial metricset comes before the Lambda handler is called.

So if we want to handle those odd cases, then we want some similar kind of corking as with cloudMetadataFetcher above. One way:

Currently the internal coordination doesn't know how to wait for both setExtraMetadata() and the callback from cloudMetadataFetcher. It could be made to know how, but instead ...

Option 2: no cloudMetadataFetcher for lambda

With option 1 we would be splitting the metadata gathering for lambda in two places: (a) the static Lambda metadata in CloudMetadataFetcher and (b) the metadata that needs the invocation context in apm.lambda(). Let's move it all to the latter.

You can stop reading here, if you like. My intent is to implement Option 2.

Option 3: overhaul metadata handling between Agent and Client

This option is only described here as a possible longer term refactoring and to show why option 2 is like it is.

This would be a lot more code churn right now, so I think it is best left to separate future work.

trentm commented 2 years ago

An example of the metadata being sent in a Lambda with my in-progress patches:

{
  "metadata": {
    "service": {
      "name": "trentm-play-fn1",
      "environment": "development",
      "runtime": {
        "name": "AWS_Lambda_nodejs14.x",
        "version": "14.17.4"
      },
      "language": {
        "name": "javascript"
      },
      "agent": {
        "name": "nodejs",
        "version": "3.23.0"
      },
      "version": "$LATEST",
      "id": "arn:aws:lambda:us-west-2:612345678904:function:trentm-play-fn1",
      "framework": {
        "name": "AWS Lambda"
      },
      "node": {
        "configured_name": "2021/11/01/[$LATEST]e7b05091b39b4aa2aef19efe4d262e79"
      }
    },
    "process": {
      "pid": 17,
      "ppid": 1,
      "title": "/var/lang/bin/node",
      "argv": [
        "/var/lang/bin/node",
        "/var/runtime/index.js"
      ]
    },
    "system": {
      "hostname": "169.254.154.197",
      "architecture": "x64",
      "platform": "linux"
    },
    "cloud": {
      "provider": "aws",
      "region": "us-west-2",
      "service": {
        "name": "lambda"
      },
      "account": {
        "id": "612345678904"
      }
    }
  }
}