aws / amazon-managed-service-for-prometheus-roadmap

Amazon Managed Service for Prometheus Public Roadmap
Other
39 stars 3 forks source link

improved logging in workspace #34

Open elasticdotventures opened 9 months ago

elasticdotventures commented 9 months ago

I've reported this to AWS support as well.

As near as I can tell 100% of the log messages in AWS cortex are useless. Log messages should provide a hint about the context of the error, they should help in diagnosing any issues or unexpected behaviors. If log messages fail to do that for whatever reason then they don't need to exist and they are just making useless noise.

logs should provide clarity and not require the administrator to guess, we have literally dozens of routes, hundreds of rules, and troubleshooting them is a huge issue. We use prometheus pint (a linter) to catch most types of errors.

I feel compelled to remind everybody that any problem on an infrastructures monitoring platform is a P1 priority, because it means alarms can get missed.

{
    "workspaceId": "ws-69b34717-4546-4e1d-a367-9f4a286a91ab",
    "message": {
        "log": "MessageAttributes has been removed because of invalid key/value, numberOfRemovedAttributes=1",
        "level": "WARN"
    },
    "component": "alertmanager"
}

Suggestions:


{
    "workspaceId": "ws-69b34717-4546-4e1d-a367-9f4a286a91ab",
    "message": {
        "log": "Subject has been modified because it is empty.",
        "level": "WARN"
    },
    "component": "alertmanager"
}

Suggestions:

{
    "workspaceId": "ws-69b34717-4546-4e1d-a367-9f4a286a91ab",
    "message": {
        "log": "Message has been modified because the content was empty.",
        "level": "WARN"
    },
    "component": "alertmanager"
}

Suggestions:

{
    "workspaceId": "ws-69b34717-4546-4e1d-a367-9f4a286a91ab",
    "message": {
        "log": "Notify for alerts failed, Invalid parameter: TopicArn",
        "level": "ERROR"
    },
    "component": "alertmanager"
}

Suggestions: