buchgr / bazel-remote

A remote cache for Bazel
https://bazel.build
Apache License 2.0
600 stars 156 forks source link

Remote API: x509: certificate signed by unknown authority #304

Closed UebelAndre closed 4 years ago

UebelAndre commented 4 years ago

I'm running into the following error when I use experimental_remote_asset_api

server_1  | 2020/07/06 00:51:42 bazel-remote built with go1.14.3 from git commit 6abc54f50d40b0d5dcc3eb902d9cbd46e892215b.
server_1  | 2020/07/06 00:51:42 Initial RLIMIT_NOFILE cur: 1048576 max: 1048576
server_1  | 2020/07/06 00:51:42 Setting RLIMIT_NOFILE cur: 1048576 max: 1048576
server_1  | 2020/07/06 00:51:42 Migrating files (if any) to new directory structure: /data/ac
server_1  | 2020/07/06 00:51:42 Migrating files (if any) to new directory structure: /data/cas
server_1  | 2020/07/06 00:51:42 Loading existing files in /data.
server_1  | 2020/07/06 00:51:42 Sorting cache files by atime.
server_1  | 2020/07/06 00:51:42 Building LRU index.
server_1  | 2020/07/06 00:51:42 Finished loading disk cache files.
server_1  | 2020/07/06 00:51:42 Loaded 0 existing disk cache items.
server_1  | 2020/07/06 00:51:42 Starting HTTPS server on address :8080
server_1  | 2020/07/06 00:51:42 HTTP AC validation: enabled
server_1  | 2020/07/06 00:51:42 Starting gRPC server on address :9090
server_1  | 2020/07/06 00:51:42 gRPC AC dependency checks: enabled
server_1  | 2020/07/06 00:51:42 experimental gRPC remote asset API: enabled
server_1  | 2020/07/06 00:52:16 GRPC GETCAPABILITIES
server_1  | 2020/07/06 00:52:25 GRPC GETCAPABILITIES
server_1  | 2020/07/06 00:52:40 GRPC GETCAPABILITIES
server_1  | 2020/07/06 00:52:46 GRPC GETCAPABILITIES
server_1  | 2020/07/06 00:52:48 failed to get URI: https://static.rust-lang.org/dist/rust-1.44.1-x86_64-apple-darwin.tar.gz err: Get "https://static.rust-lang.org/dist/rust-1.44.1-x86_64-apple-darwin.tar.gz": x509: certificate signed by unknown authority

My .bazelrc

build --remote_cache=grpcs://bazelcache.local:9090
build --experimental_remote_downloader=grpcs://bazelcache.local:9090
build --keep_going

My bazel-remote config file

dir: /data
max_size: 256

port: 8080
grpc_port: 9090

tls_cert_file: /etc/ssl/cert.pem
tls_key_file:  /etc/ssl/key.pem

experimental_remote_asset_api: true

My docker-compose file.

---
services:
  server:
    environment:
      - BAZEL_REMOTE_CONFIG_FILE=/etc/bazel-remote/config.conf
    image: bazel/buchgr:bazel-remote
    ports:
      - 8080:8080/tcp
      - 9090:9090/tcp
    restart: unless-stopped
    volumes:
      - /home/user/data:/data:Z
      - /home/user/bazel-remote.conf:/etc/bazel-remote/config.conf:Z
      - /etc/ssl/cert.pem:/etc/ssl/cert.pem:ro
      - /etc/ssl/key.pem:/etc/ssl/key.pem:ro

version: "2.4"

A repo that reproduces this issue. repro.zip

I also want to note that I built a new image off of 6abc54f50d40b0d5dcc3eb902d9cbd46e892215b

Is anyone else able to reproduce this with a similar (if not the same) setup?

mostynb commented 4 years ago

Thanks for the bug report. I wonder if you can test this on the previous bazel-remote commit (710020df649309b93d359ad892bc5f2ea874bca9) ?

UebelAndre commented 4 years ago

@mostynb Hey, Thanks for the quick reply! Yeah, I still run into this on 710020df649309b93d359ad892bc5f2ea874bca9

server_1  | 2020/07/06 15:40:46 bazel-remote built with go1.14.3 from git commit 710020df649309b93d359ad892bc5f2ea874bca9.
server_1  | 2020/07/06 15:40:46 Initial RLIMIT_NOFILE cur: 1048576 max: 1048576
server_1  | 2020/07/06 15:40:46 Setting RLIMIT_NOFILE cur: 1048576 max: 1048576
server_1  | 2020/07/06 15:40:46 Migrating files (if any) to new directory structure: /data/ac
server_1  | 2020/07/06 15:40:46 Migrating files (if any) to new directory structure: /data/cas
server_1  | 2020/07/06 15:40:46 Loading existing files in /data.
server_1  | 2020/07/06 15:40:46 Sorting cache files by atime.
server_1  | 2020/07/06 15:40:46 Building LRU index.
server_1  | 2020/07/06 15:40:46 Finished loading disk cache files.
server_1  | 2020/07/06 15:40:46 Loaded 11056 existing disk cache items.
server_1  | 2020/07/06 15:40:46 Starting HTTPS server on address :8080
server_1  | 2020/07/06 15:40:46 HTTP AC validation: enabled
server_1  | 2020/07/06 15:40:46 Starting gRPC server on address :9090
server_1  | 2020/07/06 15:40:46 gRPC AC dependency checks: enabled
server_1  | 2020/07/06 15:40:46 experimental gRPC remote asset API: enabled
server_1  | 2020/07/06 15:41:03 GRPC GETCAPABILITIES
server_1  | 2020/07/06 15:41:58 GRPC GETCAPABILITIES
server_1  | 2020/07/06 15:42:00 failed to get URI: https://static.rust-lang.org/dist/rust-1.44.1-x86_64-apple-darwin.tar.gz err: Get "https://static.rust-lang.org/dist/rust-1.44.1-x86_64-apple-darwin.tar.gz": x509: certificate signed by unknown authority
mostynb commented 4 years ago

I suspect that go's http client relies on a system certificate store, which does not exist inside the container. As a quick workaround, you can try using http URLs (which should be safe since you can specify the sha256 hash in the WORKSPACE file, though the traffic would not be kept secret).

I will do some research and see what can be done (I'm no docker expert).

mostynb commented 4 years ago

I have not been able to reproduce this error locally, though I did catch a different error with your testcase (fixed here: #307).

Inspecting the docker images with dive proved my initial guess wrong- we do already include ca certs.

Are you testing this on a network with some sort of transparent network proxy?

UebelAndre commented 4 years ago

No, I'm not behind a transparent proxy. I have this being hosted on another machine running CentOS but otherwise just ran docker-compose up -d on what I sent you.

UebelAndre commented 4 years ago

Interesting, I also was able to seemingly resolve the issue by doing the following:

  1. update the image to d8e6748f54ab271cb10e0c5ae6330fb45fe31fa5
  2. change the data mount point to /home/bazel-remote/data

I also suspect that I may have built the container where I saw the problem with a different version of Go (1.12) via a GitLab Autodev pipeline. I updated this but and did the things above and everything works.

I feel since 6abc54f50d40b0d5dcc3eb902d9cbd46e892215b the docker documentation might need to be updated since it seems like you'll get permissions errors now that the container isn't running as root.

But I think this issue was just my own stupidity. Happy it lead to something productive though. Thanks for the help! 😄

mostynb commented 4 years ago

I'm not sure what the root cause was here, we always build with the default go toolchain for the version of rules_go that is specified in WORKSPACE, and it's logged at startup - 1.14.3 as you can see in your logs. Feel free to reopen this issue if the problem reappears.

Re documentation, I have a proposed update in #309.