deis / builder

Git server and application builder for Deis Workflow
https://deis.com
MIT License
40 stars 41 forks source link

Builder proceeds if slugrunner pod is evicted #496

Open chexxor opened 7 years ago

chexxor commented 7 years ago

My slugrunner pod is quite often evicted due to low compute resources on the node.

An example log of a git push of a buildpack build.

[chexxor@fedora myapp]$ git push ssh://git@deis-builder.123.456.789.012.nip.io:2222/myapp-master.git master
Counting objects: 4129, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3757/3757), done.
Writing objects: 100% (4129/4129), 5.39 MiB | 4.75 MiB/s, done.
Total 4129 (delta 2821), reused 474 (delta 288)
remote: Resolving deltas: 100% (2821/2821), done.
Starting build... but first, coffee!
-----> Restoring cache...
       Done!
-----> Node.js app detected

-----> Creating runtime environment

       NPM_CONFIG_LOGLEVEL=error
       NPM_CONFIG_PRODUCTION=true
       NODE_ENV=production
       NODE_MODULES_CACHE=false

-----> Installing binaries
       engines.node (package.json):  4.6.0
       engines.npm (package.json):   2.15.x

       Downloading and installing node 4.6.0...
       Resolving npm version 2.15.x via semver.io...
       Downloading and installing npm 2.15.11 (replacing version 2.15.9)...

-----> Restoring cache
       Skipping cache restore (disabled by config)

-----> Building dependencies
       Running heroku-prebuild

       > rentable@1.0.0 heroku-prebuild /tmp/build
       > echo "Prebuild steps running..."

       Installing node modules (package.json)
Build complete.
Launching App...
...
...
...
...
...ote: 
...
...
...
...ote: 
...
...
...
...
...
...
...
...
...
...
...
...
Done, myapp-master:v76 deployed to Workflow

Note that the slugbuilder pod was evicted while executing the -----> Building dependencies step. I believe this because the logs produced by this step should be hundreds of lines, and the following buildpack steps don't appear, like "-----> Caching build" and "-----> Build succeeded!".

Despite this slugbuilder pod failing, the builder process continues, and prints "Build complete. skipping the failed pod check.

I upgraded my workflow just a few days ago, so I believe I have the latest versions of these components.

bacongobbler commented 7 years ago

@mboersma were you able to reproduce this issue or find a solid fix for it? If not I think we should remove this from the milestone if we can't figure out anything actionable here.

mboersma commented 7 years ago

There may be some issue here, but it's very hard to reproduce. Evicting the pod manually (with kubectl delete) doesn't produce this result, and I've only been able to hit this behavior once. I intend to look at it again before v2.13 ships, so let's leave it here for right now.

Cryptophobia commented 6 years ago

This issue was moved to teamhephy/builder#15