concourse / concourse-chart

Helm chart to install Concourse
Apache License 2.0
143 stars 174 forks source link

P2P Volume Streaming #267

Closed jdziat closed 2 years ago

jdziat commented 2 years ago

I see the options to enable P2P volume streaming, but every time i enable it workers are unable to communicate with one another. What options need to be configured and is there any documentation for this?

{"timestamp":"2021-08-15T17:13:04.318918169Z","level":"error","source":"baggageclaim","message":"baggageclaim.api.volume-server.stream-p2p-out.stream-p2p-out.failed-to-streaming-to-peer","data":{"encoding":"zstd","error":"Put \"http://172.23.136.212:7788/volumes/31c060a7-970e-45a3-7e22-d9321a059c88/stream-in?path=.\": dial tcp 172.23.136.212:7788: connect: connection refused","full-path":"/concourse-work-dir/volumes/live/a56ce783-7fe2-4f55-6e70-56b5d1a8d47c/volume","session":"3.1.10.1","sub-path":".","volume":"a56ce783-7fe2-4f55-6e70-56b5d1a8d47c"}}
taylorsilva commented 2 years ago

No docs yet :( You probably need to configure interface name pattern: https://github.com/concourse/concourse-chart/blob/016fb3ea9fcda36f141f2a17316736ca78d01e07/values.yaml#L1786-L1789

You should set it so it selects the interface that provides the worker with its LAN IP

jdziat commented 2 years ago

@taylorsilva so it is picking eth0 which should be the correct interface and this is all within the same k8s cluster so networking should be a given. Is there a port or service that needs to be exposed?

I'll try to do some testing tonight and see what's causing the issue and update this accordingly

taylorsilva commented 2 years ago

Hmm good question. We don't actually use this feature ourselves because our workers are not on the same LAN. First guess is that maybe the baggageclaim port needs to be exposed now? That's port 7788 by default.

jdziat commented 2 years ago

Tried exposing the port with no luck. Going to see what else I can try this evening.

jdziat commented 2 years ago

@taylorsilva Think It's resolved now. Updated the statefulset to expose port 7788, created a headless service for the port, and updated the concourse-worker environment variable to:

        - name: CONCOURSE_BAGGAGECLAIM_BIND_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP

Hasn't errored out yet

jdziat commented 2 years ago

Final fix for helmchart requires setting concourse.worker.baggageclaim.bindIp to nil and setting an env var in worker.env[]

concourse:
  worker:
    baggageclaim:
      bindIp:
worker:
  env:
    - name: CONCOURSE_BAGGAGECLAIM_BIND_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP