deis / builder

Git server and application builder for Deis Workflow
https://deis.com
MIT License
40 stars 41 forks source link

Proposal: send logs directly from builder pods back to builder pod #207

Open arschles opened 8 years ago

arschles commented 8 years ago

Note: I believe others have suggested a similar or identical solution to this problem in the past. Hopefully this issue solidifies those ideas.

Rel https://github.com/deis/builder/pull/185 Rel https://github.com/deis/builder/issues/199 Rel #298

Problem Statement

As of this writing, the builder does the following to do a build:

  1. Launch a builder pod (slugbuilder or dockerbuilder)
  2. Poll the k8s API for the pod's existence
  3. Begin streaming pod logs after the pod exists

We've found issues with this approach, all of which stem from the fact that the pod may not be reported as running during any polling event. This is a race condition, from which so far we've found the following symptoms:

  1. The pod has started & completed inside of one polling interval
    1. Attempted solution in https://github.com/deis/builder/pull/185. Note that this will not address the problem laid out in (2)
  2. The pod has started, completed and been garbage collected inside of one polling interval
    1. Temporary fix that relies on internal k8s GC implementation at: https://github.com/deis/builder/pull/206

      Solution Details

Because of this race condition, we can't rely on polling, and even if we successfully use the event stream (#185), k8s GC doesn't guarantee that pod logs will still be available after the pod is done. This proposal calls for the builder pod to stream its logs back to the builder that launched it.

Here are the following changes (as of this writing) that would need to happen to make this work:

  1. Each git-receive hook process runs a websocket server (on a unique port, assigned by the builder SSH server) that accepts incoming logs from the builder pod. It uses these logs for the following purposes:
    1. Writes them to STDOUT (for the builder to write back to the SSH connection)
    2. Look for a FINISHED message that indicates the builder pod is done
  2. Each git-receive process launches builder pods with its "phone-home" IP and port, which is the websocket server that they should write their logs to
  3. The builder pods now include a program that launch the builder logic (a shell script for slugbuilder and a python program for dockerbuilder). This program's purpose is to:
    1. Stream STDOUT & STDERR via a websocket connection to the phone-home address
    2. Send a FINISHED message when the builder logic exits

After the builder's git-receive hook receives the FINISHED message, or after a generous timeout, it can shut down the websocket server and continue with the logic it already has. The builder no longer would need to rely on polling the k8s API if this proposal were implemented.'

smothiki commented 8 years ago

We are anyways thinking about implementing JOBs . Which might change a lot of behavior. Also A POD getting garbage collected immediately without changing the event type is not an expected K8s behavior. The intended behavior is Event - pod status Added - A pod is created Modified -- status changes from pending to running Deleted -- status Succeeded or something with error code 0 or greater.

because of some labels mess we are not observing the POD status change from pending to running rather GC starts collecting the POD the event will be Deleted directly even though the POD is running . which is not an intended behavior . No point in streaming the logs back if the the POD is garbage collected in the middle of an execution.

smothiki commented 8 years ago

https://github.com/deis/builder/pull/185 this will solve a lot of things. I feel there is no need of special web socket connection to stream logs back.

arschles commented 8 years ago

@smothiki I'm not sure how #185 would solve this particular problem if we don't launch jobs. However, I am :+1: on using jobs for our builds when they come out of extensions. If I understand http://kubernetes.io/v1.1/docs/user-guide/jobs.html correctly, we'll be able to make an API call to get the logs of the job even if it's complete at the time of calling.

arschles commented 8 years ago

promoting to beta3

arschles commented 8 years ago

Punting to beta4

Cryptophobia commented 6 years ago

This issue was moved to teamhephy/builder#31