buchgr / bazel-remote

A remote cache for Bazel
https://bazel.build
Apache License 2.0
594 stars 154 forks source link

Unable to use Metrics endpoint with HTTPS #686

Closed JonasScharpf closed 1 year ago

JonasScharpf commented 1 year ago

Summary

TLDR; Requesting the enabled /metrics endpoint via HTTPs seems not to work as expected

curl --user "klaus:password"  https://IP-of-remote-server:9090/metrics

resource name must be a SHA256 hash in hex. got '/metrics'

Details

Local

I've created a very simple local Bazel Remote Cache server with

docker run -u 1000:1000 -v $(pwd)/bzl-cache:/data -p 9090:8080 -p 9092:9092 buchgr/bazel-remote-cache --max_size 5 --enable_endpoint_metrics

and can request the data of the /status and /metrics endpoints with

curl 172.17.0.2:8080/status
curl 172.17.0.2:8080/metrics
{
 "CurrSize": 127442944,
 "UncompressedSize": 312782848,
 "ReservedSize": 0,
 "MaxSize": 5368709120,
 "NumFiles": 74,
 "ServerTime": 1691573233,
 "GitCommit": "dc4aeace0af5b893c96bd994a816dfbaba9b18c2",
 "NumGoroutines": 10
}
# HELP bazel_remote_azblob_cache_hits The total number of azblob backend cache hits
# TYPE bazel_remote_azblob_cache_hits counter
bazel_remote_azblob_cache_hits 0
...
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

nice, as expected :partying_face:

Remote

After that I've used a real server with HTTPs (of course) and a Docker compose file.

version: '3.6'

services:
  bazel-cache:
    image: buchgr/bazel-remote-cache:latest
    container_name: bazel-remote-cache
    restart: always
    volumes:
    - /volumes/bazel/cache:/data
    - /volumes/bazel/config/htpasswd:/etc/bazel-remote/htpasswd
    - /etc/letsencrypt/fullchain.cer:/etc/bazel-remote/server_cert # need all.pem because it doesn't have an option for intermediates
    - /etc/letsencrypt/*.xxx.key:/etc/bazel-remote/server_key
    ports:
    - 9090:8080
    - 9092:9092
    # - 6060:6060 # for pprof
    environment:
      BAZEL_REMOTE_TLS_CERT_FILE: /etc/bazel-remote/server_cert
      BAZEL_REMOTE_TLS_KEY_FILE: /etc/bazel-remote/server_key
      BAZEL_REMOTE_GRPC_PORT: 9092
      BAZEL_REMOTE_HTPASSWD_FILE: /etc/bazel-remote/htpasswd
      BAZEL_REMOTE_DIR: /data
      BAZEL_REMOTE_MAX_SIZE: 480
      BAZEL_REMOTE_ENABLE_ENDPOINT_METRICS: 1
      # BAZEL_REMOTE_PROFILE_ADDRESS: :6060
    logging:
      driver: "json-file"
      options:
        max-size: "500m"

With that up an running, I requested the data of the two endpoints again

curl https://IP-of-remote-server:9090/status
curl https://IP-of-remote-server:9090/metrics
{
 "CurrSize": 5295534080,
 "UncompressedSize": 13178114048,
 "ReservedSize": 0,
 "MaxSize": 5368709120,
 "NumFiles": 271670,
 "ServerTime": 1691570602,
 "GitCommit": "4855aff6edf290d75fb54cc7a3f0c9656f5075fa",
 "NumGoroutines": 10
}

resource name must be a SHA256 hash in hex. got '/metrics'

This feels like the cache server interpreted this endpoint like a request to it's content

From Bazel point of view everything is working as expected, the remote cache is used, filled with new data and so on.

Expectation

I would expect that either none of the two endpoins /status and /metrics is working or both, but not a mix of it.

mostynb commented 1 year ago

Hi, thanks for the bug report.

I think this might actually turn out to be a misconfiguration, but it's not great that bazel-remote doesn't give a good error message.

To help debug this, I added a few changes and uploaded a new docker image: log on startup whether or not endpoint metrics are enabled, and provide a 404 error when clients request /metrics and it is disabled. Could you try this out, and report back the Endpoint metrics: log line when running your test case?

JonasScharpf commented 1 year ago

I've pulled your recently published latest image, commented the BAZEL_REMOTE_ENABLE_ENDPOINT_METRICS: 1 out, restarted the bazel remote and got Endpoint metrics are not enabled on this server. as I requested /metrics, as expected :+1: , the /status endpoint still worked (as expected). The log reports confirms this with bazel-remote-cache | 2023/08/10 09:34:29 Endpoint metrics: disabled

I've then uncommented the BAZEL_REMOTE_ENABLE_ENDPOINT_METRICS: 1 line again, to have it as mentioned above, restarted bazel remote again, requested the metrics and got bazel-remote-cache | 2023/08/10 09:35:39 Endpoint metrics: enabled from the log, and the /metrics endpoint provided the metrics.

I've reverted everything back to have the system as before and saw that we did not use a pinned version of the buchgr/bazel-remote-cache:latest image, so I've pinned it to v2.4.1, started everything again and it worked :joy: I believe we used somehow a version that had a glitch or most likely a so old version that /metrics was just not there.

Sorry for the inconvenience, nevertheless the improved log output and report is really a benefit, thanks for that!

mostynb commented 1 year ago

No problem :)