deis / builder

Git server and application builder for Deis Workflow
https://deis.com
MIT License
40 stars 41 forks source link

git push doesn't show slug build logs #514

Open gemoya opened 7 years ago

gemoya commented 7 years ago

Hi,

I have a Kubernetes v1.5.0 provided by rancher:v1.4.3 with Deis Workflow 2.14

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:33:11Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5+", GitVersion:"v1.5.0-115+611cbb22703182", GitCommit:"611cbb22703182611863beda17bf9f3e90afa148", GitTreeState:"clean", BuildDate:"2017-01-13T18:03:00Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

My problem is: When I hit 'git push deis' I got stuck at

Counting objects: 102, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (58/58), done.
Writing objects: 100% (102/102), 22.81 KiB | 0 bytes/s, done.
Total 102 (delta 39), reused 102 (delta 39)
remote: Resolving deltas: 100% (39/39), done.
Starting build... but first, coffee!

The slug builder pod is launched and completed with success, you can view manually the supposed logs to be streamed (kubectl logs slug-build-xxxxxx).

$ kubectl -n deis logs slugbuild-rara-e91bdc46-8c9e375b
-----> Restoring cache...
       No cache file found. If this is the first deploy, it will be created now.
-----> Go app detected
-----> Fetching jq... done
-----> Checking Godeps/Godeps.json file.
-----> Installing go1.7.5
-----> Fetching go1.7.5.linux-amd64.tar.gz... done
-----> Running: go install -v -tags heroku .
       github.com/deis/example-go
-----> Discovering process types
       Procfile declares types -> web
-----> Checking for changes inside the cache directory...
       Files inside cache folder changed, uploading new cache...
       Done: Uploaded cache (82M)
-----> Compiled slug size is 1.9M

Finally the app is deployed and works! but the 'git push' command never ends and the client can not know if his/her app is ready or not.

On the other way, the builder logs are:

receiving git repo name: rara.git, operation: git-receive-pack, fingerprint: 82:b4:09:7c:b9:ac:e1:b1:4b:0d:f3:7e:79:3f:ad:bb, user: admin
creating repo directory /home/git/rara.git
writing pre-receive hook under /home/git/rara.git
git-shell -c git-receive-pack 'rara.git'
Waiting for git-receive to run.
Waiting for deploy.
---> ---> ---> ---> ---> ---> ---> ---> [ERROR] Failed git receive: Failed to run git pre-receive hook:  (signal: broken pipe)
Cleaner deleting cache home/rara/cache for app rara
Cleaner deleting slug /home/rara:git-e91bdc46 for app rara

And if I go inside of builder pod and tried to debug it I encounter this on pod processes

root       300  0.0  0.0  91316  3512 ?        S    04:01   0:00 git receive-pack asdf.git
root       308  0.0  0.0  18104  2872 ?        S    04:01   0:00  \_ /bin/bash hooks/pre-receive
root       309  0.1  0.4 167260 35388 ?        Sl   04:01   0:59      \_ boot git-receive
root       310  0.0  0.0  18108   336 ?        S    04:01   0:00      \_ /bin/bash hooks/pre-receive
root       311  0.0  0.0  15428  1108 ?        S    04:01   0:00          \_ sed s/^/.[1G/

So, my idea is: the builder isn't getting the logs stream buffer from some side then I got a broken pipe because the builder is listening forever. I don't know the exact component what it uses to get the logs. I think fluentd takes the logs output of all containers but I don't know how the builder make a request of slug builder logs.

The deis workflow is deployed all on-cluster, with of-cluster redis/object-storage/all the problem persist.

An output of my deis workflow pods

$ kubectl -n deis get pods
NAME                                     READY     STATUS    RESTARTS   AGE
deis-builder-3550604618-14fq8            1/1       Running   0          16h
deis-controller-3566093518-gs23q         1/1       Running   3          16h
deis-database-223698169-qqwn7            1/1       Running   0          16h
deis-logger-343314728-9jsr7              1/1       Running   2          16h
deis-logger-fluentd-vhhbd                1/1       Running   0          16h
deis-logger-redis-394109792-tj6fv        1/1       Running   0          16h
deis-minio-676004970-144jz               1/1       Running   0          16h
deis-monitor-grafana-740719322-pvd02     1/1       Running   0          16h
deis-monitor-influxdb-2881832136-7xd6c   1/1       Running   0          16h
deis-monitor-telegraf-wgzfv              1/1       Running   1          16h
deis-nsqd-3764030276-rqbs7               1/1       Running   0          16h
deis-registry-245622726-c9p9c            1/1       Running   1          16h
deis-registry-proxy-2c7tv                1/1       Running   0          16h
deis-router-2483473170-c375l             1/1       Running   0          16h
deis-workflow-manager-1893365363-v3rfv   1/1       Running   0          16h

extra info: My kubelet running options:

kubelet --kubeconfig=/etc/kubernetes/ssl/kubeconfig --api_servers=https://kubernetes.kubernetes.rancher.internal:6443 --allow-privileged=true --register-node=true --cloud-provider=rancher --healthz-bind-address=0.0.0.0 --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --network-plugin=cni --network-plugin-dir=/etc/cni/managed.d --authorization-mode=AlwaysAllow --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0

The app los can be viewed using deis cli

$ deis logs -a rara
2017-05-08T18:13:59+00:00 deis[controller]: INFO config rara-1e3114e updated
2017-05-08T18:13:59+00:00 deis[controller]: INFO admin created initial release
2017-05-08T18:13:59+00:00 deis[controller]: INFO appsettings rara-1aaf45d updated
2017-05-08T18:13:59+00:00 deis[controller]: INFO domain rara added
2017-05-08T18:19:02+00:00 deis[controller]: INFO build rara-52988eb created
2017-05-08T18:19:02+00:00 deis[controller]: INFO admin deployed e91bdc4

Some idea how do to the get this working properly ?

bacongobbler commented 7 years ago

I would heavily suggest upgrading to v1.5.7 first and check to see if that fixes things. There are several high-severity patches since v1.5.0, any of which might be the reason why Workflow doesn't work on v1.5.0.

If you're still having issues on v1.5.7 then let us know!

IlyaSemenov commented 7 years ago

I am experiencing exactly the same problem on a newer setup:

By the way, it's not only git push deis that gets stuck, but also deis pull gets stuck on "Creating build" even after the build completes and the app starts:

$ deis pull deis/example-go -a test
Creating build... ..o  <--- this will be spinning forever

The same problem with deis config, it completes on the server but the client command never exits:

$ deis config:set POWERED_BY="test5" -a test
Creating config... ..o  <-- this will be spinning forever

Kubernetes 1.5.4 is the latest what stable Rancher offers, so it's not easy to upgrade to 1.5.7 or 1.6.x. Is there a suggested way to trace the problem?

gemoya commented 7 years ago

@IlyaSemenov

With all my tests I concluded the problem was the network stack of Rancher, so I only got working deis with some versions of Rancher networking stacks vs Rancher Kubernetes versions. To do this you need to add the rancher-community catalog from github with specifics branches (you can add all branches you need and there in environments you can use every component from the branch you want, example: kubernetes from branch1, ipsec from branch2, networking from branch3, etc )

Is a tedious work...

pd: An additional problem I got is with some versions of Kubernetes and Rancher. Specially with autoscale where deis queries the Kuberentes version expecting only numbers but Kubernetes from Rancher returns versions on format like 1.5+ where the '+' crashes the autoscaling.

Now I am using pure Kubernetes but I am trying to solve some problems with RBAC :) .

IlyaSemenov commented 7 years ago

@gemoya Thank you for sharing the experience. What do you now recommend to provision pure k8s? kubeadm?

kamatama41 commented 7 years ago

Just FYI, I'm experiencing the same error with:

Cryptophobia commented 6 years ago

This issue was moved to teamhephy/builder#9