brigadecore / brigade

Event-driven scripting for Kubernetes
https://brigade.sh/
Apache License 2.0
2.41k stars 247 forks source link

Docker for Mac - K8s brig run - sample not working #408

Closed JCzz closed 5 years ago

JCzz commented 6 years ago

Hi

bridge.js

const { events } = require('brigadier')

events.on("exec", (brigadeEvent, project) => {
  console.log("Hello world!")
})

With the helm inspect values brigade/brigade-project > myvalues.yaml not changes.

I have had this working on Google Kubernetes Engine, now I am trying to run this on Docker For Mac with Kubernetes.

I did install Brigade with helm install -n brigade brigade/brigade --set rbac.enabled=true

Now when I run brig run -f brigade.js deis/empty-testbed it gives me the following error:

Error: pod failed to schedule: 
Usage:
  brig run PROJECT [flags]

Flags:
  -c, --commit string    A VCS (git) commit version, tag, or branch (default "master")
  -e, --event string     The name of the event to fire (default "exec")
  -f, --file string      The JavaScript file to execute
  -p, --payload string   The path to a payload file

Global Flags:
      --kubeconfig string   The path to a KUBECONFIG file, overrides $KUBECONFIG.
  -n, --namespace string    The Kubernetes namespace for Brigade (default "default")
  -v, --verbose             Turn on verbose output

pod failed to schedule: 

I did try to brigade-worker-01c9w9q8zxj6zg8v4c39yx3d7m 0/1 Error 0 9m on the failing brigade-worker... it just returns an empty line.

kubectl describe brigade-worker-01c9w9q8zxj6zg8v4c39yx3d7m

Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  Scheduled              13m   default-scheduler            Successfully assigned brigade-worker-01c9w9q8zxj6zg8v4c39yx3d7m to docker-for-desktop
  Normal  SuccessfulMountVolume  13m   kubelet, docker-for-desktop  MountVolume.SetUp succeeded for volume "vcs-sidecar"
  Normal  SuccessfulMountVolume  13m   kubelet, docker-for-desktop  MountVolume.SetUp succeeded for volume "brigade-worker-token-bmsck"
  Normal  SuccessfulMountVolume  13m   kubelet, docker-for-desktop  MountVolume.SetUp succeeded for volume "brigade-build"
  Normal  Pulled                 13m   kubelet, docker-for-desktop  Container image "deis/brigade-worker:v0.11.0" already present on machine
  Normal  Created                13m   kubelet, docker-for-desktop  Created container
  Normal  Started                13m   kubelet, docker-for-desktop  Started container

Do you know why?

Thanks

technosophos commented 6 years ago

Hrm. That's unexpected. Can you tell me what version of Docker for Mac you're running and I'll try to reproduce locally?

JCzz commented 6 years ago

Hi @technosophos

My setup: is

kubectl version

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:34:11Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Running on Docker for Mac(Edge with Kubernetes)

Docker version with Kubernetes enabled in the Preferences:

Client:
 Version:   18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:06:22 2018
 OS/Arch:   darwin/amd64
 Experimental:  true
 Orchestrator:  kubernetes

Server:
 Engine:
  Version:  18.03.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:    Wed Mar 21 23:14:32 2018
  OS/Arch:  linux/amd64
  Experimental: true
technosophos commented 6 years ago

I'm testing this now.

technosophos commented 6 years ago

Did you turn RBAC on in the chart? (--set rbac.enabled=true? I'm still fighting my own way through Docker-For-Mac setup, but that is one thing I had to do.

technosophos commented 6 years ago

I'm having major problems with Docker For Mac, largely due to a problem with the way it generates SSL certificates. It sets an illegal FQDN for the cluster, and that generates errors like this:


Get https://localhost:6443/api/v1/namespaces/default/secrets?labelSelector=app%3Dbrigade%2Ccomponent%3Dproject: tls: failed to parse certificate from server: x509: cannot parse dnsName "kubernetes.default.svc."

(Note the trailing dot at the end of kubernetes.default.svc.)

Apparently a recent version of Go made it much more strict about checking this field, so I may try to revert to a version of Brigade compiled by Go 1.8 or earlier and see if I can get further.

JCzz commented 6 years ago

Hey @technosophos

I now have it running.

I hope I am able to explain myself clearly in the following:

The different is now:

  1. An upgrade to the latest docker for mac: Version 18.03.0-ce-mac58 (23607). kubectl version returns the same information. And also docker version is the same. BUT: It did come up with an update to Docker for mac, note the build number 23607. I got it from Docker for mac - preference.

  2. Also I did install brigade with --set rbac.enabled=true but not for the sample app: helm install --name my-project brigade/brigade-project -f myvalues.yaml

I only thought it was necessary for when installing brigade?

So I also tried, installing the sample with --set rbac.enabled=true: helm install --name my-project brigade/brigade-project -f myvalues.yaml --set rbac.enabled=true

And this is where I found that is was actually working again with: brig run -f brigade.js deis/empty-testbed

Then I tried to delete brigade and the sample app, and install brigade(as I learned privious in another issue - with rback enabled👍 ) and then install the sample app without rback - just to verify if it would not work. And it did worked even without rback.

Now I dont know if kubernetes saves some secrets, that enables the sample app to run after deleting the previous installation of the sample app with --set rbac.enabled=true.

Conclusion: I suspect that it is the docker for mac update that did the trick, but you might know if it was the --set rbac.enabled=true for the sample app?

Thanks again Matt

technosophos commented 6 years ago

Oh, man... I am so glad you figured that out. I'm totally having zero luck getting Docker for Mac working at all. I keep running into all these little issues with the way I have my local machine set up. I should have waited longer to upgrade to Go 1.10, I think.

Sounds like I can mark this as resolved. Thanks for the details.

JCzz commented 6 years ago

Hi @technosophos

I am back with the same problem even with the rbac.enabled=true bouth for installing brigade and for the project.

helm template install/kubernetes/helm/istio --name istio --namespace istio-system > $HOME/istio.yaml
brig run -f brigade.js deis/empty-testbed
Event created. Waiting for worker pod named "brigade-worker-01ch3gf2zmh249yjk1vy5p7b9t".
Error: pod failed to schedule: 

I dont know how to debug this. There is no vacume or sidecar I can kubectl log on.

If I do a kubectl logs on worker I get:

kubectl logs brigade-worker-01ch3fzfmc07vfm52bmam1ecf5

Error from server (BadRequest): container "brigade-runner" in pod "brigade-worker-01ch3fzfmc07vfm52bmam1ecf5" is waiting to start: PodInitializing

Do you have any idea? Thanks

This time I am running on Google Kuberneres Engine

I am running Helm, brigade & kubectl:

and brigade.js

const { events } = require('brigadier')

events.on("exec", (brigadeEvent, project) => {
  console.log("Hello world!")
})
helm install -n brigade brigade/brigade

And I build brig from latest - not using the pre builds.
helm version
Client: &version.Version{SemVer:"v2.10.0-rc.1", GitCommit:"aa98e7e3dd2356bce72e8e367e8c87e8085c692b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.10.0-rc.1", GitCommit:"aa98e7e3dd2356bce72e8e367e8c87e8085c692b", GitTreeState:"clean"}
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.7", GitCommit:"dd5e1a2978fd0b97d9b78e1564398aeea7e7fe92", GitTreeState:"clean", BuildDate:"2018-04-19T00:05:56Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.7-gke.3", GitCommit:"9b5b719c5f295c99de68ffb5b63101b0e0175376", GitTreeState:"clean", BuildDate:"2018-05-31T18:32:23Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
technosophos commented 6 years ago

Can you do a kubectl logs brigade-worker-01ch3fzfmc07vfm52bmam1ecf5 vcs-sidecar? Or, alternatively, a kubectl describe pod brigade-worker-01ch3fzfmc07vfm52bmam1ecf5

technosophos commented 6 years ago

Re-opening

emmayang commented 6 years ago

Hi @technosophos, I met the same error when running on pure kubernetes cluster (installed by kubeadm), and I got exactly same error with @JCzz , and it reports Event created. Waiting for worker pod named "brigade-worker-01cjcsnqz2wd1vxa131jzsends". Error: timeout waiting for build 01cjcsnqz2wd1vxa131jzsends to start.

I did try @JCzz 's suggesting by adding the --set rbac.enabled=true on both brigade install and project install, but no luck.

And I found it not possible to do kubectl logs on the worker pod, since it's not even created or exits at all.

Could you please shed some light on this?

emmayang commented 6 years ago

@technosophos I think I figured out my problem... I installed brigade in another namespace rather than default, and seems the rbac is not working with non-default namespaces.

My initial installation command is: helm install -n brigade brigade/brigade --namespace=utils --set rbac.enabled=true --set vacuum.age=10h and worker pod cannot be created, but when I remove the setting for namespace --namespace=utils, it worked fine.

technosophos commented 6 years ago

I will try to reproduce that here. Maybe there is an error in the Helm chart having to do with namespaces.

technosophos commented 6 years ago

I'm having some trouble reproducing. But here's what I did to attempt to reproduce:

I started a fresh minikube instance with RBAC enabled, then installed Helm.

I installed Brigade like this:

$BRIG=$(pwd)
$BRIGADE_NS="brigade"
$BRIGADE_PROJECT=$HOME/Code/deis-empty-testbed

helm install -n brigade $BRIG/charts/brigade -f $BRIG/_scratch/local-brigade.yaml --namespace $BRIGADE_NS
helm install $BRIG/charts/brigade-project -n empty-testbed -f $BRIGADE_PROJECT/values.yaml --namespace $BRIGADE_NS

Then I executed this:

$ brig run deis/empty-testbed -n brigade
Event created. Waiting for worker pod named "brigade-worker-01cjmx0shgwj3xchb862a7ceft".
Build: 01cjmx0shgwj3xchb862a7ceft, Worker: brigade-worker-01cjmx0shgwj3xchb862a7ceft
prestart: no dependencies file found
prestart: src/brigade.js written
[brigade] brigade-worker version: 0.15.0
[brigade:k8s] Creating PVC named brigade-worker-01cjmx0shgwj3xchb862a7ceft
[brigade:app] no-after: fired
[brigade:app] beforeExit(2): destroying storage
[brigade:k8s] Destroying PVC named brigade-worker-01cjmx0shgwj3xchb862a7ceft

Here's a list of all the roles/rolebindings I have:

$ kubectl get rolebinding -n brigade
NAME                        AGE
brigade-brigade-api         39m
brigade-brigade-cr-gw       39m
brigade-brigade-ctrl        39m
brigade-brigade-github-gw   39m
brigade-brigade-vacuum      39m
brigade-brigade-wrk         39m
cron-brigade-cron           39m
trello-brigade-trello       39m
$ kubectl k get role -n brigade
NAME                        AGE
brigade-brigade-api         39m
brigade-brigade-cr-gw       39m
brigade-brigade-ctrl        39m
brigade-brigade-github-gw   39m
brigade-brigade-vacuum      39m
brigade-brigade-wrk         39m
cron-brigade-cron           39m
trello-brigade-trello       39m

Anything else you can help me with so I can see if I can reproduce?

emmayang commented 6 years ago

@technosophos Thanks for working on this! And the only difference I spotted, is I used "kubeadm" to install my kube cluster, but not minikube. And according to it's document, kubeadm enabled rbac by default. Not sure this matters, but that's all I found.

And btw, I found you didn't even install brigade with the --set rbac.enabled=true, and last time I installed without rbac, I got a lot of error vacuum pods. I searched around and enabled rbac to fixed it.

technosophos commented 6 years ago

I turned it on in the values file. Here's the file:

rbac:
  enabled: true
controller:
  registry: deis
  name: brigade-controller
  tag: latest
api:
  enabled: true
  registry: deis
  name: brigade-api
  tag: latest
  service:
    name: brigade-api
    type: ClusterIP
    externalPort: 7745
    internalPort: 7745

# worker is the JavaScript worker. These are created on demand by the controller.
worker:
  registry: deis
  name: brigade-worker
  tag: latest
  #pullPolicy: IfNotPresent

# gw is the GitHub gateway.
gw:
  enabled: true
  registry: deis
  name: brigade-gateway
  tag: latest
  #pullPolicy: IfNotPresent
  #buildForkedPullRequests: true

cr:
  enabled: true
  registry: deis
  name: brigade-cr-gateway
  tag: latest
  service:
    name: brigade-cr-service
    type: ClusterIP  # Change to LoadBalancer if you want this externally available.
    externalPort: 80
    internalPort: 8000

# The vacuum periodically cleans up old builds.
# Brigade does not delete builds on completion. Instead, it leaves builds around
# for a period of time, providing you with an opportunity to inspect builds for
# data.
# The vacuum will sweep the system at intervals and clear out old builds.
#
# To globally turn of the vacuum, set enabled=false
vacuum:
  enabled: true
  # Set a schedule for how frequently this check is run.
  # Note that a run of the vacuum typically takes at least a minute. Finer-level
  # granularity than that may result in multiple vacuums running at once.
  # Format follows accepted Cron formats: https://en.wikipedia.org/wiki/Cron
  schedule: "@hourly"
  registry: "deis"
  name: "brigade-vacuum"
  # tag: latest
  # Age tells the vacuum how old a thing may be before it is considered ready to
  # be vacuumed. The format is an integer followed by the suffix h (hours), m (minutes)
  # or s (seconds).
  # The default is 30 days (720 hours)
  age: "720h"
  # maxBuilds tells the vacuum what the absolute maximum number of builds may be stored
  # at a time. Where possible, we recommend using age rather than builds.
  # 0 means no limit is imposed.
  #
  # If both age and maxBuilds are provided, age is applied first, then maxBuilds.
  maxBuilds: 0
  tag: latest

# The service is for the Brigade gateway. If you do not want to have Brigade
# listening for incomming GitHub requests, disable this.
service:
  name: brigade-service
  type: LoadBalancer
  externalPort: 7744
  internalPort: 7744
# By default, this is off. If you enable it, you might want to change the
# service.type to ClusterIP
ingress:
  enabled: false
  hosts: []
  # Add TLS configuration
  # tls: <TLS_CONFIG>
  # Add custom annotations
  # annotations:
  #   name: value

# DEVELOPMENT ONLY: Use this for off-ACS development
# Before enabling this, log into the acr registry with Docker and then
# run `scripts/generate-acr-secret.sh`
#privateRegistry: brigade-registry
emmayang commented 6 years ago

@technosophos Got it, so both cases are rbac enabled, and no difference then.

technosophos commented 6 years ago

@emmayang We're still hunting this down, and one thing we discovered is that older RBACs might be causing problems with newer versions of the Controller. Can you check whether the role/role binding for the controller is allowing create for pod? It should look something like this in the role:

kind: Role
apiVersion: {{ template "brigade.rbac.version" }}
metadata:
  name: ...
rules:
- apiGroups: [""]
  resources: ["pods", "secrets", "configmaps"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

That assumes that the Controller is running in the same namespace that is configured in the project's namespace field, which you may also wish to check.

technosophos commented 6 years ago

Some of the issues that @JCzz reports look similar to #408

emmayang commented 6 years ago

@technosophos I worked around this problem by install in default namespace, and tried to get the information you want with a problematic one, so I removed it and installed again into customized namespace, and this time the error is gone... I can get it working well as expected. Anyway, here's my current value of controller role & rolebinding:

root@qcloud:~# k get roles -n utils
NAME                               AGE
brigade-server-brigade-api         32m
brigade-server-brigade-ctrl        32m
brigade-server-brigade-github-gw   32m
brigade-server-brigade-vacuum      32m
brigade-server-brigade-wrk         32m
root@qcloud:~# k get roles -n utils  brigade-server-brigade-ctrl -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: 2018-07-23T03:10:13Z
  labels:
    app: brigade-server-brigade
    chart: brigade-0.15.0
    heritage: Tiller
    release: brigade-server
  name: brigade-server-brigade-ctrl
  namespace: utils
  resourceVersion: "22704671"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/utils/roles/brigade-server-brigade-ctrl
  uid: ea6c9188-8e25-11e8-99e5-005056a0fca0
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - secrets
  - configmaps
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
root@qcloud:~# k get rolebindings -n utils brigade-server-brigade-ctrl
NAME                          AGE
brigade-server-brigade-ctrl   33m
root@qcloud:~# k get rolebindings -n utils brigade-server-brigade-ctrl -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  creationTimestamp: 2018-07-23T03:10:13Z
  labels:
    app: brigade-server-brigade
    chart: brigade-0.15.0
    heritage: Tiller
    release: brigade-server
  name: brigade-server-brigade-ctrl
  namespace: utils
  resourceVersion: "22704676"
  selfLink: /apis/rbac.authorization.k8s.io/v1/namespaces/utils/rolebindings/brigade-server-brigade-ctrl
  uid: ea75747b-8e25-11e8-99e5-005056a0fca0
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: brigade-server-brigade-ctrl
subjects:
- kind: ServiceAccount
  name: brigade-server-brigade-ctrl
technosophos commented 6 years ago

Okay. I will leave this issue open for a bit. I reorganized my local minikube cluster so that I am using a non-default namespace for all my dev work. Hopefully I will catch namespace errors earlier now.

Thanks again for following up so diligently.

blimmer commented 5 years ago

I can confirm that I experienced these same issues today with minikube until I recreated brigade with rbac enabled:

helm install -n brigade brigade/brigade --set rbac.enabled=true
radu-matei commented 5 years ago

Tried with later versions of Minikube and Docker Desktop and didn't experience the issue. Closing for now, but feel free to reopen if you the issue reappears, or if you have additional questions.

Thanks!