gocd / kubernetes-elastic-agents

Kubernetes Elastic agent plugin for GoCD
https://www.gocd.org
Apache License 2.0
34 stars 32 forks source link

Unusual error breaking kubernetes connection #149

Closed UlliBe closed 5 years ago

UlliBe commented 5 years ago

GoCD new install with helm GoCD 19.9.0, Plugin 3.2.0-187 / 3.3.0-191 While trying to generate a status report on the elastic agent:

jvm 1 | 2019-10-14 16:22:37,097 INFO [qtp309906614-36] p.c.g.c.e.k.c.g.c.e.KubernetesPlugin:72 [plugin-cd.go.contrib.elasticagent.kubernetes] - [refresh-pod-state] Pod information successfully synced. All(Running/Pending) pod count is 0. jvm 1 | 2019-10-14 16:22:37,100 INFO [qtp309906614-36] p.c.g.c.e.k.c.g.c.e.KubernetesPlugin:72 [plugin-cd.go.contrib.elasticagent.kubernetes] - [status-report] Generating status report. jvm 1 | 2019-10-14 16:22:37,205 ERROR [qtp309906614-36] p.c.g.c.e.k.c.g.c.e.KubernetesPlugin:127 [plugin-cd.go.contrib.elasticagent.kubernetes] - Error while generating status report: For input string: "6694Mi" jvm 1 | java.lang.NumberFormatException: For input string: "6694Mi" jvm 1 | at java.base/java.lang.NumberFormatException.forInputString(Unknown Source) jvm 1 | at java.base/java.lang.Long.parseLong(Unknown Source) jvm 1 | at java.base/java.lang.Long.valueOf(Unknown Source) jvm 1 | at cd.go.contrib.elasticagent.model.KubernetesNode.(KubernetesNode.java:56) jvm 1 | at cd.go.contrib.elasticagent.model.KubernetesCluster.lambda$new$0(KubernetesCluster.java:37) jvm 1 | at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) jvm 1 | at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source) jvm 1 | at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source) jvm 1 | at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source) jvm 1 | at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source) jvm 1 | at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)

Thanks for any idea how to fix this, been driving me up the wall for a few days now :(

adityasood commented 5 years ago

@torgon could you help with a screenshot of your Cluster Profile and Elastic Profile you have configured?

Any environment details would also be helpful. The helm chart is deployed in minikube, EKS, GCP etc?

UlliBe commented 5 years ago

Sure Elastic agent profile https://gyazo.com/80ff8cef49f87f82e408c1f0e114b702 Deployed on a DigitalOcean kubernetes cluster v1.15.3

adityasood commented 5 years ago

@torgon I tried helm install of the latest GoCD helm chart on a minikube cluster with

minikube version: v1.4.0
commit: 7969c25a98a018b94ea87d949350f3271e9d64b6

Kubectl version

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T23:49:20Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

I could see the status report

2019-10-15 10:35:38,140 INFO  [qtp1152224728-36] KubernetesPlugin:72 - [refresh-pod-state] Pod information successfully synced. All(Running/Pending) pod count is 0.
2019-10-15 10:35:38,145 INFO  [qtp1152224728-36] KubernetesPlugin:72 - [status-report] Generating status report.
2019-10-15 10:35:38,153 INFO  [qtp1152224728-36] KubernetesPlugin:72 - Running kubernetes nodes 1
2019-10-15 10:35:38,158 INFO  [qtp1152224728-36] KubernetesPlugin:72 - Running pods 0

What happens when you do the following for your DigitalOcean kubernetes cluster?

kubectl get nodes

#pick a node where your GoCD server is deployed and run

kubectl describe nodes <name-of-the-node> 
adityasood commented 5 years ago

@torgon also have you made any changes to GoCD helm chart values file? If yes can you let me know what are the changes, might help in replicating this

UlliBe commented 5 years ago

Name: pool-prod-01-wo4l Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=s-4vcpu-8gb beta.kubernetes.io/os=linux doks.digitalocean.com/node-id=66734b22-1e11-46cc-b529-5ddecfe04dc3 doks.digitalocean.com/node-pool=pool-prod-01 doks.digitalocean.com/node-pool-id=49823cd5-d74e-44e3-9b84-361994428487 doks.digitalocean.com/version=1.15.3-do.1 failure-domain.beta.kubernetes.io/region=fra1 kubernetes.io/arch=amd64 kubernetes.io/hostname=pool-prod-01-wo4l kubernetes.io/os=linux region=fra1 Annotations: csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"161214164","io.rancher.longhorn":"pool-prod-01-wo4l"} field.cattle.io/creatorId: io.cilium.network.ipv4-cilium-host: 10.244.2.1 io.cilium.network.ipv4-health-ip: 10.244.2.74 io.cilium.network.ipv4-pod-cidr: 10.244.2.0/24 node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 01 Oct 2019 08:42:59 +0000 Taints: Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


NetworkUnavailable False Tue, 01 Oct 2019 08:43:14 +0000 Tue, 01 Oct 2019 08:43:14 +0000 CiliumIsUp Cilium is running on this node MemoryPressure False Tue, 15 Oct 2019 11:09:34 +0000 Tue, 01 Oct 2019 08:42:59 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 15 Oct 2019 11:09:34 +0000 Tue, 01 Oct 2019 08:42:59 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 15 Oct 2019 11:09:34 +0000 Tue, 01 Oct 2019 08:42:59 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 15 Oct 2019 11:09:34 +0000 Tue, 01 Oct 2019 08:43:09 +0000 KubeletReady kubelet is posting ready status Addresses: Hostname: pool-prod-01-wo4l InternalIP: 10.135.233.58 ExternalIP: 206.81.21.21 Capacity: attachable-volumes-csi-dobs.csi.digitalocean.com: 7 cpu: 4 ephemeral-storage: 165105408Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 8170040Ki pods: 110 Allocatable: attachable-volumes-csi-dobs.csi.digitalocean.com: 7 cpu: 4 ephemeral-storage: 165105408Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 6694Mi pods: 110 System Info: Machine ID: 96dc6ede0ebd41bca73d4fcc38e016e3 System UUID: 96dc6ede-0ebd-41bc-a73d-4fcc38e016e3 Boot ID: cb256851-77b1-4780-b608-5b2d90200d5c Kernel Version: 4.19.0-0.bpo.5-amd64 OS Image: Debian GNU/Linux 9 (stretch) Operating System: linux Architecture: amd64 Container Runtime Version: docker://18.9.2 Kubelet Version: v1.15.3 Kube-Proxy Version: v1.15.3 PodCIDR: 10.244.2.0/24 ProviderID: digitalocean://161214164 Non-terminated Pods: (27 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE

... skipped list of running pods

gocd gocd-server-5bdd76689f-zwpbt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 18h

... skipped list of running pods

Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits


cpu 727m (18%) 1402m (35%) memory 1287Mi (19%) 570Mi (8%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-csi-dobs.csi.digitalocean.com 0 0 Events:

Helm set variable: server.service.type=ClusterIP

adityasood commented 5 years ago

@torgon thank you for the information. As I understand this line here is responsible for converting the Allocatable Memory into the Long format.

For one of your nodes, the Allocatable memory is returned with different units than expected. This looks like a bug and we will take a look at it and publish a fix.

UlliBe commented 5 years ago

thats great, thank you !

adityasood commented 5 years ago

@torgon the team has made a fix for the issue you were facing. We have an experimental release available here, can you help us verify if this works in your environment?

To use this experimental in your Helm chart you can do the following

  1. Download the values.yaml file for GoCD helm chart. You can then edit the url here to point to the experimental build and use this values file to install your helm chart. The command to do so would be

helm install --name gocd-app --namespace gocd stable/gocd -f values.yaml

UlliBe commented 5 years ago

@adityasood that looks a whole lot better, seems to be fixed Thank you !

UlliBe commented 5 years ago

Minor follow up, though: the pod link in this screenshot is not working image https://gocdserver.../go/admin/status_reports/cd.go.contrib.elasticagent.kubernetes/gocd-agent-cac2d33d-abd0-4114-9609-ea237938d645 Pod is up and running

vrushaliwaykole commented 5 years ago

@torgon can you please share the go-server and Kubernetes plugin logs for the same?