Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.63k stars 2.84k forks source link

Azure monitor's list metrics for cpu_percent always returns 0.0 for sql database resources #38252

Open ben-vega opened 2 weeks ago

ben-vega commented 2 weeks ago

Describe the bug Issue getting cpu_percent metric for Sql database resource types. The API always returns a value of 0.0 regardless of resource, aggregation. or timeframe. If others are unable to reproduce or there is something I missed, any insight would be greatly appreciated,

To Reproduce Steps to reproduce the behavior:

  1. Authenticate with Azure in python, for example with DefaultAzureCredential() object
  2. Create a MonitorManagementClient() using those credentials from step 1
  3. Call monitor_client.metrics.list() passing in resource id, timespan, interval, metricnames, and aggregation parameters. For my use case I have resource_id="/subscriptions//resourceGroups//providers/Microsoft.Sql/servers//databases/", timespan="2024-10-29 00:00:00.000001-07:00/2024-10-29 23:59:59.999999-07:00", interval="PT1H", metricnames="cpu_percent", aggregation="Average"
  4. Parse results and inspect the cpu percents returned at each timeframe

Expected behavior An API response with hourly metric data on cpu_percent of the database. Instead, I always get "0.0" back for the cpu_percent at every interval. I have tried different databases, different subscription, different aggregations, different intervals, the only aggregation that gives me a number back is "count" for obvious reasons. I've had a couple others look at my code as well and can't find anything wrong either.

Screenshots n/a

Additional context This problem seems to be larger than just the python-sdk, the azure CLI also always returns 0.0 for cpu_percent of sql databases. There are several metric types that do give actual numbers when i pass a different metricnames value, for example 'sql_instance_memory_percent' gives real numbers and works as expected. However, there are several that always return 0.0 or null. My database is not off over the interval where I am asking for metrics, This is further supported by the fact that I get "real" data back for other metricnames when I use them in the same call as cpu_percent (for example using metricnames="cpu_percent,sql_instance_memory_percent". I understand that depending on the resource type there might be different metrics available, however according to docs here there aren't any limits mentioned for that metric type. Also, the fact that using unsupported metric types in the list call throws an error further leads me to believe that the issue is not due to passing the wrong value for 'metricnames' parameter.

github-actions[bot] commented 2 weeks ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

github-actions[bot] commented 1 week ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @gulopesd @Haiying-MSFT @jairmyree @joshfree @KarishmaGhiya @KevinBlasko @kurtzeborn @nisha-bhatia @pvaneck @sarangan12 @scottaddie @srnagar @ToddKingMSFT.

msyyc commented 1 week ago

@ChenxiJiang333 Please help on this issue.

ChenxiJiang333 commented 1 week ago

got it

ChenxiJiang333 commented 1 week ago

Hi @ben-vega, somehow it works with me by passing your parameters. I guess one of the possible reasons is the timespan was not set correctly, maybe you could try not set timespan then the service would return the data in last hour. If it still doesn't work, please share the api-version you were calling, which you could get from the log by following https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/debug_guide.md. Image

github-actions[bot] commented 1 week ago

Hi @ben-vega. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

ben-vega commented 1 week ago

Hi @ChenxiJiang333 thanks for getting back to me. I tried it without providing a timeframe, and it responded with the most recent hour but I still got 0's for all databases in my resource group. I wonder if there is something else at play here outside of code, like is there a "setting" I need to enable to allow azure to collect this information? Is there restrictions on what "type" of sql database has this information? Maybe there needs to be a permission added to the client I am authorizing with? I've looked around the internet a fair bit with no luck so far, if you could point me in the right direction I would appreciate it. Here is the debug logs with the version from the request (please let me know if this is not what you were looking for) and here is my python code so we are working with the same codebase. "...Microsoft.Insights/metrics?interval=PT1H&metricnames=cpu_percent&aggregation=Average&api-version=2024-02-01"

from azure.identity import DefaultAzureCredential
from azure.mgmt.sql import SqlManagementClient
from azure.monitor.query import MetricsQueryClient
from datetime import timedelta
# import sys, logging

# logging.basicConfig(level=logging.DEBUG,
#                     format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
#                     stream=sys.stdout)

def get_sql_databases(subscription_id, resource_group_name):
    # Authenticate with Azure
    credential = DefaultAzureCredential()

    # Initialize the SQL Management client
    sql_client = SqlManagementClient(credential, subscription_id)

    # Get list of SQL servers in the resource group
    sql_servers = sql_client.servers.list_by_resource_group(resource_group_name)

    # Dictionary to store SQL servers and their respective databases
    sql_servers_databases = {}

    for server in sql_servers:
        server_name = server.name
        print(f"Found SQL Server: {server_name}")

        # Get all databases for this server
        databases = sql_client.databases.list_by_server(resource_group_name, server_name)
        sql_servers_databases[server_name] = [db.name for db in databases]

    return sql_servers_databases

def query_cpu_metrics(subscription_id, resource_group_name, server_name, database_name):
    credential = DefaultAzureCredential()

    # Initialize the Metrics Query client
    metrics_client = MetricsQueryClient(credential)

    # Define the resource URI for the specific SQL database
    resource_uri = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.Sql/servers/{server_name}/databases/{database_name}"

    # Query for CPU metrics
    response = metrics_client.query_resource(
        resource_uri,
        metric_names=["cpu_percent"],
        # timespan=timedelta(days=1),
        granularity=timedelta(hours=1),
        aggregations=["Average"]
    )

    for metric in response.metrics:
        if metric.name == "cpu_percent":
            for time_series in metric.timeseries:
                for data in time_series.data:
                    print(f"Timestamp: {data.timestamp}, Average CPU Percent: {data.average}")

if __name__ == "__main__":
    subscription_id = "your-subscription-id"
    resource_group_name = "your-resource-group-name"

    # Get all SQL servers and databases
    sql_servers_databases = get_sql_databases(subscription_id, resource_group_name)

    # Retrieve CPU metrics for each database
    for server, databases in sql_servers_databases.items():
        for db in databases:
            print(f"\nQuerying CPU metrics for database: {db} on server: {server}")
            query_cpu_metrics(subscription_id, resource_group_name, server, db)
ChenxiJiang333 commented 5 days ago

Hi @ben-vega, have you tried to update your databases like adding some new tables or new columns before you list the cpu percent?

github-actions[bot] commented 2 days ago

Hi @ben-vega. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

ben-vega commented 17 hours ago

@ChenxiJiang333 I added 10000 rows to my table and I finally got non 0's back. This seems like something I should have figured out myself, thank you for your replies. I guess I assumed there would be some number returned even if it wasn't very much usage. Is there a threshold where the api only reports back a cpu percent if there is enough activity to report? For example if the actual cpu percent is something really small like 0.0005% the api will respond with 0.0 until it reaches 0.01% actual utilization then the api will report back there is 0.01% utilization