aws-observability / aws-otel-test-framework

AWS Distro for OpenTelemetry Test Framework
https://aws-otel.github.io/
Apache License 2.0
28 stars 62 forks source link

containerinsight_eks_prometheus test is failing #334

Open straussb opened 3 years ago

straussb commented 3 years ago

https://github.com/aws-observability/aws-otel-collector/runs/3416656988?check_suite_focus=true#logs

For some reason the nginx_ingress_controller_nginx_process_connections_total metric is not being picked up by the Collector, even though all the other nginx metrics are.

validator_1  | com.amazon.aoc.exception.BaseException:
│ [ContainerInsight] metric
│ nginx_ingress_controller_nginx_process_connections_total not found with
│ dimension [ClusterName: aws-otel-testing-framework-eks, Namespace:
│ nginx-349dacac375861a3, Service:
│ nginx-349dacac375861a3-ingress-nginx-controller-metrics]

I reproduced the issue and scraped the nginx server's Prometheus endpoint myself, and did see the nginx_ingress_controller_nginx_process_connections_total metric reported:

# HELP nginx_ingress_controller_nginx_process_connections_total total number of connections with state {accepted, handled}
# TYPE nginx_ingress_controller_nginx_process_connections_total counter
nginx_ingress_controller_nginx_process_connections_total{controller_class="nginx",controller_namespace="nginx-aacb4125dbe93c21",controller_pod="nginx-aacb4125dbe93c21-ingress-nginx-controller-64fcfb4cc8zkd4l",state="accepted"} 27333
nginx_ingress_controller_nginx_process_connections_total{controller_class="nginx",controller_namespace="nginx-aacb4125dbe93c21",controller_pod="nginx-aacb4125dbe93c21-ingress-nginx-controller-64fcfb4cc8zkd4l",state="handled"} 27333

I added some logging around here in the Prometheus receiver and saw that the metric was not present even at that point (other nginx metrics were).

For now, we will comment out that metric from the verification.

vasireddy99 commented 2 years ago

Closing this issue as the tests are successful atm, please reopen this issue for any questions/concerns

straussb commented 2 years ago

That's because this test case is still commented: https://github.com/aws-observability/aws-otel-test-framework/blob/terraform/validator/src/main/resources/expected-data-template/container-insight/eks/prometheus/nginx_metrics.mustache#L46

Please leave the issue open until the test is uncommented and passing.