Default filterPattern often causes setup problems

blimmer commented 6 years ago

By default, this filter-pattern doesn't usually work:

https://github.com/ACloudGuru/serverless-plugin-cloudwatch-sumologic/blob/51fb8333f59ef1abdfdd7ed49039919c32a73ac5/src/index.js#L78-L80

The reason is because most of my logs have the log level before the first timestamp:

e.g.

[WARNING] 2018-07-10T22:34:59.702Z 78c487eb-8491-11e8-a9be-abc1234241 blocklist: set()

This causes no logs to be delivered to the Lambda.

I'd like to propose two options instead of this default.

No `filterPattern` by default

This is probably the easiest and most effective in most cases. This would change the default behavior to have filterPattern be a blank string if you don't explicitly set it.

Pros

Straight-forward to implement
Originally what I expected to happen

Cons

The Lambda function start / end / report blocks would be sent to SumoLogic
Change in default behavior might be undesirable for users as they upgrade

`filterPattern` using `...` instead

From the AWS docs on the topic, it states that you can use the ellipses when you don't know the number of fields present.

In cases where you don't know the number of fields, you can use shorthand notification using an ellipsis (…). For example:
[..., status_code, bytes]
[ip, user, ..., status_code, bytes]
[ip, user, ...]

We could instead use something like this to try to extract just the logs.

Pros

If executed properly, would remove the Lambda start / end / report blocks

Cons

Prone to breaking again if CloudWatch changes the format in some way

Thoughts?

What do you think? I think either way this should be changed since, by default, this has stopped reporting all of my logs with the default filter because of the log_level introduction as the first param.

GeneralistDev commented 6 years ago

Is that log level prefix a new default for CloudWatch or just something specific to logs from your application? If it's not the default functionality for CloudWatch then I'm inclined to leave the filterPattern as is because it should match the default for CloudWatch (or at least it did when I first built it).

Note: I no longer maintain this repo and haven't got ownership rights, but I can get in touch with people who do so you can discuss changes with them.

blimmer commented 6 years ago

I was trying to find documentation on it, but it seems to be the case that it shows the log_level first. All of our functions stopped reporting recently and they all now are prefixed with the log_level, so I'm at least seeing that behavior from Node & Python.

GeneralistDev commented 6 years ago

I just created a new Node 8.10 function with the following code

exports.handler = async (event) => {
    console.log('This is a normal log');
    console.error('This is an error log');
    console.warn('This is a warning');
    // TODO implement
    return 'Hello from Lambda!';
};

And there weren't any log levels in the CloudWatch output. logs

I'm not sure why we're getting different results.

GeneralistDev commented 6 years ago

I just tested it using the logger library in Python and I can see the log levels being output there. I'm not sure what the behaviour is if you just use print but I can see how this would be annoying for python users. Perhaps if you could put your suggested changes into a PR I can forward that onto the maintainers?

blimmer commented 6 years ago

Heh, yep I was just doing the same for Python. Here's that:

import logging

logger = logging.getLogger('parser_function')

def lambda_handler(event, context):
    logger.warn("Warn level log")
    logger.error("Error level log")
    logger.critical("Critical level log")
    # TODO implement
    return 'Hello from Lambda'

Logs:

START RequestId: c9b0e93a-855c-11e8-8148-4d2dd817ded7 Version: $LATEST
[WARNING]   2018-07-11T22:50:20.438Z    c9b0e93a-855c-11e8-8148-4d2dd817ded7    Warn level log
[ERROR] 2018-07-11T22:50:20.439Z    c9b0e93a-855c-11e8-8148-4d2dd817ded7    Error level log
[CRITICAL]  2018-07-11T22:50:20.439Z    c9b0e93a-855c-11e8-8148-4d2dd817ded7    Critical level log
END RequestId: c9b0e93a-855c-11e8-8148-4d2dd817ded7
REPORT RequestId: c9b0e93a-855c-11e8-8148-4d2dd817ded7  Duration: 0.66 ms   Billed Duration: 100 ms     Memory Size: 128 MB Max Memory Used: 21 MB

I'm happy to put together a PR, but would like some feedback from them on which of the two options they'd prefer from the original post.

GeneralistDev commented 6 years ago

My preference would be version 2 with the ... to avoid breaking existing deployments so long as we test it well.

I'll pass the link to this issue onto the maintainers and see if they can take a look soon.

blimmer commented 6 years ago

I agreed originally on the 2nd option. I was struggling, however, on getting the pattern just right. For example, if we did:

[..., timestamp=*Z, request_id="*-*", event]

That would break with no log level since it says "1 or more fields before timestamp", which fails in the vanilla Node example above.

So I'm not sure what the correct syntax would be. That's why I think having no filter would be the safer default option. We could bump the versioning to 0.1.0 to indicate there's a breaking change (minor versions before 1.0 can have breaking changes according to semver).

A third option would be adding more documentation around these problems.

GeneralistDev commented 6 years ago

Another possibility is basing the default filter on the runtime provider.runtime value. We can just run some regex tests on it. If it contains python then we use the ellipsis version etc.

blimmer commented 6 years ago

That's a possibility, but adds more complexity that's destined to break more often and require attention.

My vote would be to "keep it simple" and remove the filterPattern entirely vs adding more logic.

blimmer commented 6 years ago

I've created #17 to remove the default filterPattern if that's the route the owners want to go.

GeneralistDev commented 6 years ago

I get where you're coming from however removing it all together will break the library for everyone using it when they upgrade and we'll have to write a guide on how to configure their own default filter pattern and mark it as a breaking change etc.

I think we're kinda stuck between a rock and a hard place here.

My gut instinct is to build in the smarts to distinguish between node, python etc. because in my time using Lambda the node log format hasn't changed, and I'd rather go back and fix the pattern if and when it does than to have the users have to struggle through that on their own.

I think putting that burden on the users will somewhat erode the value of the library for them.

GeneralistDev commented 6 years ago

@blimmer #17 looks good but let me just do some investigating into the feasibility of maintaining a few distinct log styles for different lambda platforms before merging it in. I'll aim to get back to you this week.

blimmer commented 6 years ago

Sounds good - I've also opened #18 which is just a readme update so people at least have a warning about the behavior.

I appreciate your help with this. This is about all the time I have to dedicate to contributing to this issue, so please feel free to go whichever way you'd like from here. Thanks!

GeneralistDev commented 6 years ago

Awesome. Thanks so much for your contribution. I'll aim to get #18 in soon

ACloudGuru / serverless-plugin-cloudwatch-sumologic