fabric8io / jenkins-pipeline-library

a collection of reusable jenkins pipelines and pipeline functions
Apache License 2.0
276 stars 180 forks source link

java.net.UnknownHostException: kubernetes.default.svc: unknown error - KubernetesFacade.getNodeOfPod line 164 #160

Open LarsMilland opened 7 years ago

LarsMilland commented 7 years ago

Hi

I am trying to get the Fabric8 jenkins-pipeline-library to run outside a pure fabric8 configuration, and have created a separate namespace for this on OpenShift Origin 1.2 also with an Jenkins server running outside of OpenShift. The Kubernetes jenkins cloud slave provider plugin works fine, and can launch jenkinsslaves on OpenShift for me, but the "kubernetes pipeline library" is not working for me.

I keep getting an "unknown error" in the KubernetesFacade getNodeOfPod function where I guess the code is trying to lookup the node of the first pod launched, so that it can launch the second pod directly on the same host as the first one. See the attached log files. jenkinslog.txt unknownerror.txt

I have tried the setup on the same OpenShift origin installation with the default fabric8 installation, where jenkins is running inside OpenShift in the same namespace as the "build slaves" are launched, and there everything is working just fine.

Any good suggestions to what may be wrong here?

Best regards Lars Milland

LarsMilland commented 7 years ago

Hi

I have still not solved the problem reported on this issue, so I do hope that there are someone that can give some qualified input to which direction to look.

I have tried to narrow in what could be causing the problem, but without much success yet, so I have then also tried to create small sample Java program that is doing the same thing as the pipeline code is doing through its groovy script when it is to launch the "buildpod". That looks as shown here below, and the I have manually created a "permanent" jenkins-slave on the OpenShift cluster where I execute the Java program with oc rsh into the OpenShift pod, and strangely enough that code works just fine. When the exact same pod is used to run the groovy script shown also here from the associated jenkins system the host lookup fails somewhere below call to:

io.fabric8.kubernetes.pipeline.KubernetesFacade.getNodeOfPod(KubernetesFacade.java:164)

Groovy script:

node('manual') { kubernetes.pod('buildpod').withImage('fabric8/maven-builder') .inside { echo "Hello from buildpod" } }

Java sample program with Kubernetes client that strangely enough works: ` String master = "https://kubernetes.api.mydomain:8443/";

    Config config = new ConfigBuilder().withMasterUrl(master).build();
    try (final KubernetesClient client = new DefaultKubernetesClient(config)) {
      Pod pod = client.pods().withName("jenkins-jnlp-client-manual").get();
          String nodeName = pod.getSpec().getNodeName();
          Node node = client.nodes().withName(nodeName).get();
      System.out.println("Node: " + node);
    } catch (KubernetesClientException e) {
        System.out.println(e.getMessage());
    }

`

Best regards Lars Milland

nickdgriffin commented 7 years ago

I'm having the exact same problem, although in my case I'm still using gofabric8 to launch a minikube and them I'm accessing it for a remote Jenkins (which happens to be running in a different virtualbox on my machine). I can also provision slaves on Kubernetes.

Even though my Cloud is setup with the correct Kubernetes URL ("Test Connection" tells me everything is fine) and I've replicated the templates over with the matching KUBERNETES_MASTER value to match the URL, it still appears that when running a build that uses this library it uses the default master URL defined here: https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L98

I suspect that in your case Lars it's a similar situation where what is happening is whilst the KubernetesClient that is created for the Cloud stuff gets the correct environment variables set that when invoked from this library it resorts to the default, which will only work from "within" a fabric8 deployment.

It was pretty tricky to spot that it was this that was the issue, because most of the time I would get a socket time out and only sometimes would it tell me the domain it was trying to hit. Adding a logger also didn't report anything when running, although when using "Test Connection" under the Cloud definition it would produce output.

rawlingsj commented 7 years ago

Sorry I originally missed this issue. @nickdgriffin you are correct that this shared pipeline library assumes that it is running inside the kubernetes / openshift cluster and not on a remote Jenkins. We can add some checks before calling this function and ensure that the library has access to the kubernetes token that is automatically mounted into the Jenkins pod.

I'll add a note to the readme to clear this up.