aws-observability / observability-best-practices

Observability best practices on AWS
https://aws-observability.github.io/observability-best-practices/
MIT No Attribution
224 stars 69 forks source link

Supporting RDS MS SQL Performance Insights metrics #105

Closed eranmos closed 10 months ago

eranmos commented 11 months ago

Hi Team, I would like to know if the Performance Insights metrics can support AWS RDS Ms SQL on: https://github.com/aws-observability/observability-best-practices/tree/main/sandbox/monitor-aurora-with-grafana

I deployed your solution and its amazing ( thanks for the hard work) but I can see that it's not working on MS SQL RDS, Any road map to support it?

lewinkedrs commented 11 months ago

Hi @eranmos , The lambda function should just grab all RDS instances with Performance Insights enabled, not only aurora.

# Get DB instances for which Performance Insights have been enabled
def get_pi_instances():
    dbInstancesResponse = rds_client.describe_db_instances()

    if dbInstancesResponse:
        response = filter(lambda _: _.get('PerformanceInsightsEnabled', False), dbInstancesResponse['DBInstances'])

        if response:
            dbInstanceList = [item['DbiResourceId'] for item in response]
            return dbInstanceList
    return None

So as long as you have Performance Insights enabled on your MS SQL RDS instance, the function should grab those metrics. As of now though we do publish them into a CloudWatch namespace called "/AuroraMonitoringGrafana/PerformanceInsightMetrics" . But you could edit the name of this parameter when you deploy the CloudFormation by changing the field "TargetMetricNamespace" . So you could name it for example "/DatabaseMonitoringGrafana/PerformanceInsightMetrics" to be more inclusive.

If the lambda function is not grabbing the proper metrics for MS SQL can you let us know here what error/behavior you are getting.

eranmos commented 11 months ago

Hi @lewinkedrs , Thanks for your fast answer and detailed explanation, I deployed the code and left all the default configurations for now The error that I am getting is :

[ERROR] InvalidArgumentException: An error occurred (InvalidArgumentException) when calling the GetResourceMetrics operation: This group is not a known group: db.session_type is not valid for current resource
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 33, in lambda_handler
    pi_response = get_db_resource_metrics(instance)
  File "/var/task/lambda_function.py", line 78, in get_db_resource_metrics
    response = pi_client.get_resource_metrics(
  File "/var/runtime/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
eranmos commented 11 months ago

Hi @lewinkedrs , can you please help me with the above issue?

lewinkedrs commented 11 months ago

Hey @eranmos , this happens because some of the parameters that are being added as custom metrics do not exist for MS SQL. In the case of your error it is the session_type. You could probably edit the parameters captured in the lambda function to only include what MS SQL performance insights provides.

dbSliceGroup = { "db.sql_tokenized", "db.application", "db.wait_event", "db.user", "db.session_type", "db.host", "db", "db.application" }

It is not likely that we will be continuing to add features to this example, but we would welcome a contribution if you wanted to try and add this support and contribute back to the repo.

eranmos commented 10 months ago

Hi @lewinkedrs , Thank you very much for your help, The parameter db.session_type is not supported for MS SQL.