Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.35k stars 1.99k forks source link

[ARM] Metric not found after using list metric definition #41874

Open jsarrelli opened 1 month ago

jsarrelli commented 1 month ago

We are consuming metrics for different resources. The way we do it is by first listing the metrics


  def listMetricsV2(resourceId: String): Future[ResourceMetrics] =
    Source
      .fromPublisher[MetricDefinition](
        underlying
          .listByResourceAsync(resourceId)
          .timeout(Duration.ofSeconds(5))
          .retryWhen(AzureClient.retrySpec)
      )
      .runFold(List.empty[MetricDefinition])(_ :+ _)
      .map(metricDefinitions =>
        ResourceMetrics(
          resourceId,
          metricDefinitions.map { metric =>
            metric.name.value -> metric
          }.toMap,
        )
      )

and then with those metrics, we query specific ones.

....
    val datapointsF: Future[Map[Instant, Double]] = metrics
      .get(metricName)
      .map { metric =>
        val query = metric
          .defineQuery()
          .startingFrom(OffsetDateTime.ofInstant(start, ZoneId.systemDefault()))
          .endsBefore(OffsetDateTime.ofInstant(end, ZoneId.systemDefault()))
          .withAggregation(aggregation.entryName)
          .withInterval(Duration.ofSeconds(period))
      }.getOrElse {
        log.debug("Requested metric was not under Azure available ones")
        }

...

So with the metric present in the listMetric response, we are getting the following errors.

com.azure.core.management.exception.ManagementException: Status code 400, "{"code":"BadRequest","message":"Failed to find metric configuration for provider: Microsoft.Web, resource Type: serverfarms, metric: DiskQueueLength, Valid metrics: "}"

SDK Version: "com.azure.resourcemanager" % "azure-resourcemanager" % "2.42.0"

joshfree commented 1 month ago

@weidongxu-microsoft could you please take a look?

weidongxu-microsoft commented 1 month ago

@v-hongli1 Please see if you can reproduce. The resource for the monitor metrics should be AppServicePlan.

jsarrelli commented 1 month ago

I think it happens for different services

com.azure.core.management.exception.ManagementException: Status code 400, "{"code":"BadRequest","message":"Failed to find metric configuration for provider: Microsoft.Sql, resource Type: servers/databases, metric: dtu_consumption_percent, Valid metrics: cpu_percent,physical_data_read_percent,connection_successful,connection_failed,connection_failed_user_error,blocked_by_firewall,availability,ledger_digest_upload_success,ledger_digest_upload_failed"}"

Here is the example on how dtu_consumption_percent exists on the listMetrics response but failed when try to query the values

Screenshot 2024-09-23 at 9 17 32 PM

v-hongli1 commented 1 month ago

@

I think it happens for different services

com.azure.core.management.exception.ManagementException: Status code 400, "{"code":"BadRequest","message":"Failed to find metric configuration for provider: Microsoft.Sql, resource Type: servers/databases, metric: dtu_consumption_percent, Valid metrics: cpu_percent,physical_data_read_percent,connection_successful,connection_failed,connection_failed_user_error,blocked_by_firewall,availability,ledger_digest_upload_success,ledger_digest_upload_failed"}"

Here is the example on how dtu_consumption_percent exists on the listMetrics response but failed when try to query the values

Screenshot 2024-09-23 at 9 17 32 PM

The available aggregations for each metric in Microsoft.Sql/servers/databases are not exactly the same, so it is necessary to filter the metrics by the available aggregations.

v-hongli1 commented 1 month ago

We are consuming metrics for different resources. The way we do it is by first listing the metrics


  def listMetricsV2(resourceId: String): Future[ResourceMetrics] =
    Source
      .fromPublisher[MetricDefinition](
        underlying
          .listByResourceAsync(resourceId)
          .timeout(Duration.ofSeconds(5))
          .retryWhen(AzureClient.retrySpec)
      )
      .runFold(List.empty[MetricDefinition])(_ :+ _)
      .map(metricDefinitions =>
        ResourceMetrics(
          resourceId,
          metricDefinitions.map { metric =>
            metric.name.value -> metric
          }.toMap,
        )
      )

and then with those metrics, we query specific ones.

....
    val datapointsF: Future[Map[Instant, Double]] = metrics
      .get(metricName)
      .map { metric =>
        val query = metric
          .defineQuery()
          .startingFrom(OffsetDateTime.ofInstant(start, ZoneId.systemDefault()))
          .endsBefore(OffsetDateTime.ofInstant(end, ZoneId.systemDefault()))
          .withAggregation(aggregation.entryName)
          .withInterval(Duration.ofSeconds(period))
      }.getOrElse {
        log.debug("Requested metric was not under Azure available ones")
        }

...

So with the metric present in the listMetric response, we are getting the following errors.

com.azure.core.management.exception.ManagementException: Status code 400, "{"code":"BadRequest","message":"Failed to find metric configuration for provider: Microsoft.Web, resource Type: serverfarms, metric: DiskQueueLength, Valid metrics: "}"

SDK Version: "com.azure.resourcemanager" % "azure-resourcemanager" % "2.42.0"

  1. The aggregation of Microsoft.Web/serverfarms only supports "None", "Average", "Minimum", "Maximum", "Total", "Count"

  2. The interval only supports PT1M, PT5M, PT15M, PT30M, PT1H, PT6H, PT12H, and P1D.

Please confirm that the values ​​of Aggregation and Interval in the above code are correct.

jsarrelli commented 1 month ago

In both cases, I'm aggregating by Average using a PT1M time grain

v-hongli1 commented 1 month ago

@jsarrelli

  1. ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = cpu_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = physical_data_read_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = log_write_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = storage
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = connection_successful
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Total,Count
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = connection_failed
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Total,Count
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = connection_failed_user_error
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Total,Count
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = blocked_by_firewall
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Total,Count
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = availability
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = deadlock
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Total,Count
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = storage_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = xtp_storage_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = workers_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = sessions_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = sessions_count
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = cpu_limit
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = cpu_used
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = sqlserver_process_core_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = sqlserver_process_memory_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = sql_instance_cpu_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = sql_instance_memory_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = tempdb_data_size
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = tempdb_log_size
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = tempdb_log_used_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = app_cpu_billed
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Total
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = app_cpu_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = app_memory_percent
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = allocated_data_storage
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = full_backup_size_bytes
    The intervals of metric = PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = diff_backup_size_bytes
    The intervals of metric = PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = log_backup_size_bytes
    The intervals of metric = PT24H
    The supported aggregation types = Average,Maximum,Minimum
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = ledger_digest_upload_success
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Count
    ********************************************************************
    The namespace of metric = Microsoft.Sql/servers/databases
    The name of metric = ledger_digest_upload_failed
    The intervals of metric = PT1M,PT5M,PT15M,PT30M,PT1H,PT6H,PT12H,PT24H
    The supported aggregation types = Count
    ********************************************************************

    The above are the valid combinations of time granularity and aggregation for each metric of Microsoft.Sql/servers/databases, so for this service, the combination of time granularity PT1M and aggregation Average is not applicable to all metrics.

  2. The combination of PT1M time granularity and average aggregation applies to all metrics in Microsoft.Web/serverfarms. Can you take more detailed exception information?