Apicurio / apicurio-registry-operator

The Kubernetes Operator for Apicurio Registry.
Apache License 2.0
43 stars 38 forks source link

Metrics not accessible with disableHttp=true #247

Open pantaoran opened 6 months ago

pantaoran commented 6 months ago

In my custom resource ApicurioRegistry I want to set

    spec:
      configuration:
        security:
          https:
            disableHttp: true

for security hygiene. My enterprise environment forbids plain http. When I set that, port 8080 is no longer exposed on the registry pod. But then, my ServiceMonitor still guides Prometheus to port 8080 for metrics, which seems to fail since I don't see those metrics in Prometheus afterwards. Only when I set disableHttp back to false can I see metrics again.

What's the idea here? Is this a bug? How can I disable the plain endpoint for API clients, but still get metrics to operate the registry?

jsenko commented 6 months ago

hi, have you used port 8443? This works for me:

sh-4.4$ curl -k https://172.30.102.121:8443/metrics
# TYPE jvm_gc_memory_allocated_bytes counter
# HELP jvm_gc_memory_allocated_bytes Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next
jvm_gc_memory_allocated_bytes_total 9.0329816E7
# TYPE jvm_buffer_memory_used_bytes gauge
# HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
jvm_buffer_memory_used_bytes{id="mapped - 'non-volatile memory'"} 0.0
jvm_buffer_memory_used_bytes{id="mapped"} 0.0
jvm_buffer_memory_used_bytes{id="direct"} 828616.0
# TYPE agroal_blocking_time_total_milliseconds gauge
# HELP agroal_blocking_time_total_milliseconds Total time applications waited to acquire a connection.
agroal_blocking_time_total_milliseconds{datasource="default"} 7.0
# TYPE jvm_memory_used_bytes gauge
# HELP jvm_memory_used_bytes The amount of used memory
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'"} 9099776.0
jvm_memory_used_bytes{area="heap",id="PS Old Gen"} 2.4009872E7
jvm_memory_used_bytes{area="heap",id="PS Survivor Space"} 529016.0
jvm_memory_used_bytes{area="heap",id="PS Eden Space"} 1711904.0
jvm_memory_used_bytes{area="nonheap",id="Metaspace"} 5.4044872E7
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'"} 1379200.0
jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space"} 6506072.0
jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'"} 2202752.0
# TYPE worker_pool_active gauge
# HELP worker_pool_active The number of resources from the pool currently used
pantaoran commented 6 months ago

Huh, you are correct, this seems to work when querying localhost in the pod itself. Thanks for that, somehow it seems I didn't think of trying that.

EDIT: I was using a PodMonitor as follows. First I had a wrong label matched, but now it works.

spec:
  podMetricsEndpoints:
    - path: /metrics
      scheme: https
      targetPort: 8443
      tlsConfig:
        insecureSkipVerify: true
  selector:
    matchLabels:
      apicur.io/name: service-registry
pantaoran commented 6 months ago

Actually, I just realized that this operator generates a ServiceMonitor automatically. But it doesn't seem to work for my described case, it still remains on port 8080 when I have disableHttp: true set.