Bug: powertools Logger Excessive memory usage with large messages and log rate

Dingway98 commented 1 month ago

Expected Behavior

logger.info or any logger calls should not be using excessive memory usage with

Max Memory Used: 3339 MB

This is opposed to straight console.log, where we get

Max Memory Used: 83 MB

Current Behavior

See the executions logs.

When we run the simple code snippet with powertools logger logger.info, high memory usage is recorded.

Max Memory Used: 3339 MB

Code snippet

var serviceName = 'exampleService';
var logger = new Logger({ serviceName });
var handler = async (event, context) => {
  let largeObject = {
    prop1: 'prop',
    prop2: 'prop',
    prop3: 'prop',
    prop4: 'prop',
    prop5: 'prop',
    prop6: 'prop',
    prop7: 'prop',
    prop8: 'prop',
    prop9: 'prop',
    prop10: 'prop',
    prop11: 'prop',
    prop12: 'prop',
    prop13: 'prop',
    prop14: 'prop',
    prop15: 'prop',
    prop16: 'prop',
    prop17: 'prop',
    prop18: 'prop',
    prop19: 'prop',
    prop20: 'prop',
    prop21: 'prop',
    prop22: 'prop',
    prop23: 'prop',
    prop24: 'prop',
    prop25: 'prop',
    prop26: 'prop',
    prop27: 'prop',
    prop28: 'prop',
  };
  logger.info(JSON.stringify(event));
  logger.info(JSON.stringify(context));
  logger.info('hello world');

  for (let i = 0; i < 100000; i++) {
    logger.info(`hello world`, { largeObject });
    logger.info(`hello world`, { largeObject });
    logger.info(`hello world`, { largeObject });
    logger.info(`hello world`, { largeObject });
    logger.info(`hello world`, { largeObject });
    logger.info(`hello world`, { largeObject });
    logger.info(`hello world`, { largeObject });

    // console.log(`hello world`, { largeObject });
    // console.log(`hello world`, { largeObject });
    // console.log(`hello world`, { largeObject });
    // console.log(`hello world`, { largeObject });
    // console.log(`hello world`, { largeObject });
    // console.log(`hello world`, { largeObject });
    // console.log(`hello world`, { largeObject });

  }
  return 'done';
};

Steps to Reproduce

Entire handler code posted in Code snippet
Create lambda with runtime environment nodejs-18.x
install latest @aws-lambda-powertools/logger, and import
run code snippet with lambda

Possible Solution

Could this be related to the internal buffer powertools logger is setting? There is a buffer before the logger instance is initialized, but I am unsure what is happening during initialization.

Powertools for AWS Lambda (TypeScript) version

2.7.0

AWS Lambda function runtime

18.x

Packaging format used

npm

Execution logs

with powertools logger:

REPORT RequestId: <redacted> Duration: 29762.11 ms  Billed Duration: 29763 ms   Memory Size: 10240 MB   Max Memory Used: 3339 MB

with console.log:

REPORT RequestId: <redacted> Duration: 21149.55 ms  Billed Duration: 21150 ms   Memory Size: 10240 MB   Max Memory Used: 83 MB  Init Duration: 200.98 ms

boring-cyborg[bot] commented 1 month ago

Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #typescript channel on our Powertools for AWS Lambda Discord: Invite link

dreamorosi commented 1 month ago

Hi @Dingway98, thank you for opening the issue.

Please allow us some time to triage the issue and try reproducing the behavior.

I'll get back to you here as soon as I have news to share.

ckifer commented 1 month ago

To help narrow things down for you I went back to version 0.2.0 (way back) and this for the same function produced a normal amount of memory usage: Max Memory Used: 125 MB.

But this happens in both the latest v1 and latest v2, so something was added in v1/pre-v1 that is causing this.

Edit: it is somewhere between 0.7.2 (good) and 0.8.1 (bad) Edit: From my testing 0.8.1 is the culprit with PR https://github.com/aws-powertools/powertools-lambda-typescript/pull/748/files

H4ad commented 1 month ago

This does not look an issue on powertools, but rather on the AWS Image for Node, if we replace the logger to:

import { Console } from 'console';

const logger = new Console({
  stdout: process.stdout,
  stderr: process.stderr,
})

We still have the memory issue.

A quick fix for powertools is just to remove the reference for Console and use directly the global console object.

Edit: I tried to test that sample of code on node locally but I didn't see any memory issue, I also tried to take a look at the nodejs runtime and they only do a simple log patch but didn't modify/use the Console object.

Edit2: On node 20.x we also have the error with memory.

H4ad commented 1 month ago

I did some investigation and looks like the issue is related to the stdout and stderr not being synchronous and the code not giving the event loop time necessary to flush the logs.

A heap snapshot showed the WritableState was holding all the memory:

This data is being buffered because the logs when using Console and stdout are not blocking, and they are being buffered until the next tick when they are flushed.

If you add something like await new Promise(resolve => setTimeout(resolve, 2)); just to give to Node.js some time to flush the logs, you will not see the memory issue.

Also, you can make the stdout and stderr synchronous by putting the following code:

if (process.stdout._handle) process.stdout._handle.setBlocking(true);
if (process.stderr._handle) process.stderr._handle.setBlocking(true);

Anyway, feel free to choose between one of these solutions, I can open I PR when you choose how do you want to proceed.

dreamorosi commented 1 month ago

Hi @H4ad, thank you SO much for this investigation and for also providing recommendations!

I've been trying to understand the implications of setting the process.stdout._handle to blocking and from the little info I could find on it in other discussions the consensus seems to be fairly split and in many cases leaning towards it having a detrimental impact on performance:

While I can verify that setting these two lines drops the memory usage to similar levels to the regular console.log, I'd like to spend a bit more time investigating and understanding the root cause & solution a bit better.

To try isolating the issue, I created two new test cases, neither of which uses the Powertools for AWS Logger and that still reproduce the issue. The main difference in both these tests is that we are stringifying the log similar to how Logger would.

In sample 1, I am using the patched console.log as it appears in the Node.js RIC here:

const serviceName = 'exampleService';

export const handler = (event: unknown, context: unknown) => {
  const largeObject = {
    prop1: 'prop',
    prop2: 'prop',
    prop3: 'prop',
    prop4: 'prop',
    prop5: 'prop',
    prop6: 'prop',
    prop7: 'prop',
    prop8: 'prop',
    prop9: 'prop',
    prop10: 'prop',
    prop11: 'prop',
    prop12: 'prop',
    prop13: 'prop',
    prop14: 'prop',
    prop15: 'prop',
    prop16: 'prop',
    prop17: 'prop',
    prop18: 'prop',
    prop19: 'prop',
    prop20: 'prop',
    prop21: 'prop',
    prop22: 'prop',
    prop23: 'prop',
    prop24: 'prop',
    prop25: 'prop',
    prop26: 'prop',
    prop27: 'prop',
    prop28: 'prop',
  };
  console.log(JSON.stringify(event));
  console.log(JSON.stringify(context));
  console.log('hello world');

  for (let i = 0; i < 100000; i++) {
    const line = JSON.stringify({
      timestamp: new Date().toISOString(),
      level: 'INFO',
      service: serviceName,
      message: 'hello world',
      ...largeObject,
    });
    console.log(line);
  }
  return 'done';
};

In sample 2, I am instead printing directly to process.stdout:

const serviceName = 'exampleService';

export const handler = (event: unknown, context: unknown) => {
  const largeObject = {
    prop1: 'prop',
    prop2: 'prop',
    prop3: 'prop',
    prop4: 'prop',
    prop5: 'prop',
    prop6: 'prop',
    prop7: 'prop',
    prop8: 'prop',
    prop9: 'prop',
    prop10: 'prop',
    prop11: 'prop',
    prop12: 'prop',
    prop13: 'prop',
    prop14: 'prop',
    prop15: 'prop',
    prop16: 'prop',
    prop17: 'prop',
    prop18: 'prop',
    prop19: 'prop',
    prop20: 'prop',
    prop21: 'prop',
    prop22: 'prop',
    prop23: 'prop',
    prop24: 'prop',
    prop25: 'prop',
    prop26: 'prop',
    prop27: 'prop',
    prop28: 'prop',
  };

  for (let i = 0; i < 100000; i++) {
    let line = JSON.stringify({
      timestamp: new Date().toISOString(),
      level: 'INFO',
      service: serviceName,
      message: 'hello world',
      ...largeObject,
    });
    line = line.replace(/\n/g, '\r');
    process.stdout.write(`${line}\n`);
  }
  return 'done';
};

I then tested both:

sample 1 (console.log) - Duration: 2308.49 ms Billed Duration: 2309 ms Memory Size: 10240 MB Max Memory Used: 77 MB Init Duration: 203.04 ms
sample 2 (process.stdout.write) - Duration: 1585.19 ms Billed Duration: 1586 ms Memory Size: 10240 MB Max Memory Used: 374 MB Init Duration: 213.40 ms

Both these tests have shorter memory usage and duration from the ones in the OP because I have reduced the amount of logs being printed by 1/7th, however we can still see that when using process.stdout.write the memory usage is several orders of magnitude higher than when using console.log.

Setting the process.stdout._handle to blocking in the second sample removes the memory usage and brings it to a similar level to the console.log sample.

This data is being buffered because the logs when using Console and stdout are not blocking, and they are being buffered until the next tick when they are flushed.

This makes sense, but then why is this line not affected since the patched console.log also writes to stdout??

I'm going to reach out to the Lambda team to see if they can look into this and have an explanation on why we're seeing this behavior.

H4ad commented 1 month ago

This makes sense, but then why is this line not affected since the patched console.log also writes to stdout??

Good question, I read the source code of Node and I didn't found a difference between using Console or using process.stdout.write or console.log.

This difference seems to be something particular only to AWS, maybe the implementation is patched again in another part of the code.

leandrodamascena commented 1 month ago

Hi @H4ad! Thanks a lot for looking into this issue and giving us some direction to fix it. We are working with Lambda engineers to understand the root cause of this issue because as you mentioned, it seems like something very specific to the node embedded in the Lambda node image.

Você é o cara! ❤️

dreamorosi commented 1 month ago

Hello,

We're still waiting on the internal team from AWS Lambda to provide a path forward within the runtime. They are aware of the issue and have already started looking into it.

We are in close contact with them and will share more info here as soon as we can.

aws-powertools / powertools-lambda-typescript