aws-observability / observability-best-practices

Observability best practices on AWS
https://aws-observability.github.io/observability-best-practices/
MIT No Attribution
224 stars 69 forks source link

Grafana unable to show RDS Postgres metrics #107

Open vitorguidi opened 11 months ago

vitorguidi commented 11 months ago

I am running grafana 10.2.0, and trying to collect metrics from cloudwatch. I am running RDS Postgres 12.14, and using the lambda from the monitoring-aurora-with-grafana folder.

I ran into a first issue, where this line of code would fail due to sending over 1000 data items:

    if metric_data:
        logger.info('## sending data to cloduwatch...')
        try:
            cw_client.put_metric_data(
            Namespace= targetMetricNamespace,
            MetricData= metric_data)
        except ClientError as error:
            raise ValueError('The parameters you provided are incorrect: {}'.format(error))

I then changed the code to split the put call in batches of 500, and it stopped happening:

result = []
    max_elements = 500
    for i in range(0, len(metric_data), max_elements):
        result.append(metric_data[i:i + max_elements])

    if metric_data:
        for entry in result:
            logger.info('## sending data to cloduwatch...')
            try:
                cw_client.put_metric_data(
                Namespace= targetMetricNamespace,
                MetricData= entry)
            except ClientError as error:
                raise ValueError('The parameters you provided are incorrect: {}'.format(error))

After this, it started erroring out from the db_slices LoC:

dbSliceGroup = { "db.sql_tokenized", "db.application", "db.wait_event", "db.user", "db.session_type", "db.host", "db", "db.application" }

I got some exceptions from the lambda, where initially db.session_type was considered invalid. Then a couple others which I did not take note also failed. I then commented everything out, to remain only with db.sql_tokenized:

dbSliceGroup = { 
    "db.sql_tokenized", 
    #"db.application", 
    #"db.wait_event", 
    #"db.user", 
    #"db.session_type", 
    #"db.host", 
    #"db", 
    #"db.application" 
}

Things then worked out and I managed to see the metrics in the AuroraMonitoringGrafana/PerformanceInsightMetrics namespace. Unfortunately though, the metrics do not show up in Grafana, despite me adding the custom namespaces in configs. I am using the dashboard included in this repo, for the aurora use case.

Is there any subtlety involved, to make these metrics work? I do not know where to look further, some assistance would be appreciated.

image image
vitorguidi commented 11 months ago

Some extra information: whenever I perform a query on cloudwatch via boto3, the metrics in a custom namespace are not returned, despite there being metrics (as showed previously in the cloudwatch ui). Nonetheless, whenever we query an AWS namespace (ie, AWS/Athena), I get a non empty response.

This points to a possible issue in the way the cloudwatch API is being queried, as far as custom namespaces go

vitorguidi commented 11 months ago

Figured that one out => I was unable to query the CloudWatch api (through boto3) with the AuroraMonitoringGrafana/PerformanceInsightMetrics namespace. I switched to grafana/rds and it worked just fine.

Lastly, I added the dimensions to the tokenized_sql query (db.tokenized_sql.id and db.tokenized_sql.db_id) and it did the trick.

image