aws-observability / observability-best-practices

Observability best practices on AWS
https://aws-observability.github.io/observability-best-practices/
MIT No Attribution
224 stars 69 forks source link

InvalidArgumentException in monitor-aurora-with-grafana lamda #108

Open syndic8-joe opened 10 months ago

syndic8-joe commented 10 months ago

I implemented the monitor-aurora-with-grafana script as described without issue (all 4 steps in the CF stack were successful), however each lamda invocation includes this error:

[ERROR] InvalidArgumentException: An error occurred (InvalidArgumentException) when calling the GetResourceMetrics operation: This group is not a known group: db.application is not valid for current resourceTraceback (most recent call last):  File "/var/task/lambda_function.py", line 33, in lambda_handler    pi_response = get_db_resource_metrics(instance)  File "/var/task/lambda_function.py", line 78, in get_db_resource_metrics    response = pi_client.get_resource_metrics(  File "/var/runtime/botocore/client.py", line 530, in _api_call    return self._make_api_call(operation_name, kwargs)  File "/var/runtime/botocore/client.py", line 960, in _make_api_call    raise error_class(parsed_response, operation_name) | [ERROR] InvalidArgumentException: An error occurred (InvalidArgumentException) when calling the GetResourceMetrics operation: This group is not a known group: db.application is not valid for current resource Traceback (most recent call last):   File "/var/task/lambda_function.py", line 33, in lambda_handler     pi_response = get_db_resource_metrics(instance)   File "/var/task/lambda_function.py", line 78, in get_db_resource_metrics     response = pi_client.get_resource_metrics(   File "/var/runtime/botocore/client.py", line 530, in _api_call     return self._make_api_call(operation_name, kwargs)   File "/var/runtime/botocore/client.py", line 960, in _make_api_call     raise error_class(parsed_response, operation_name)

and no metrics are published to CloudWatch.

I have confirmed the region is correct, and we have multiple databases in this region with Performance Insights enabled.

kshammai commented 10 months ago

I managed to resolve it by removing the invalid groups from dbSliceGroup (line 25 - sandbox/monitor-aurora-with-grafana/function/lambda_function.py)

In my scenario, I made the following change:

Originally:

dbSliceGroup = { "db.sql_tokenized", "db.application", "db.wait_event", "db.user", "db.session_type", "db.host", "db", "db.application" }

Changed to:

dbSliceGroup = { "db.wait_event", "db.user", "db.host", "db" }
LorenzoRogai commented 6 months ago

This is happening because the AWS Guide talk about Aurora PostgreSQL. We have a MySQL cluster and this is happening also for us, the above workaround works fine. You can however put the "db.sql_tokenized" metric again. That is correctly recognized