Closed stv0g closed 1 year ago
In GitLab by @skolen on Aug 27, 2021, 13:47
Problem solved in 8ac188c9 and finally 8584b4ac.
Using the golang:1.16-buster
image instead of the golang:1.16
image was the solution. I made a mistake when updating to go 1.16 and accidentally removed the buster
tag from gitlab ci yaml file and Dockerfile.
Just for reference: i've seen similar errors previously with postgresql code. The underlying cause were some pretty old CPUs which are still being used in our OpenStack cluster. Apparently the libpq library contains some optimized code/instructions which were incompatible with those older CPU's
In GitLab by @skolen on Sep 22, 2021, 13:51
mentioned in commit 3a0da86d92f3c5ea47ee0eedb01a4dfdc1f6b34d
In GitLab by @skolen on Sep 23, 2021, 14:03
This issue is back. The CI does not work right now because the postgres service does not start properly and gives the same error as described here.
I have checked our permanent deployment of PostgreSQL for the version of VILLAS web which is running in Kubernetes. The permanent PostgreSQL deployment runs for the following node affinity setting:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kubernetes-worker-7
I will check next if the can pin the PostgreSQL spawned by our CI in a similar way.
I've checked the documentation on the Gitlab runner Kubernetes executor.
Unfortunately, there seems to be no way to limit the execution of individual services by a node selector. We could only limit the execution of all CI jobs which would make the whole thing slower as we have less resources to distribute the CI jobs.
Do we know which Kubernetes nodes are causing the issue? I think we can simply backlist those and we should be fine.
In GitLab by @skolen on Sep 27, 2021, 13:39
I know that at least kubernetes-worker-7 causes the issue.
Thats strange. Isnt our current permanent Postgres instance running always on kubernetes-worker-7 without issues?
I think just the CI service picks a random worker every time the service is spawned.
In GitLab by @skolen on Sep 27, 2021, 14:17
In the last week, the problem always occurred with worker 7 (and only worker 7!). My assumption is that the problem is not caused by postgres alone but by a combination of postgres and gitlab-runner environment config/ openstack.
In GitLab by @skolen on Sep 27, 2021, 14:19
I am not sure whether or not this is relevant, but the problem reappeared last week after we had a problem with our kubernetes master node.
In GitLab by @skolen on Oct 14, 2021, 14:24
Note: (One part of) The problem is definitely our k8s worker-node-7. Now (on a different worker node) the pipeline is functional again.
In GitLab by @skolen on Aug 13, 2021, 14:27
Since recently, the postgres service used in the CI of this project is not starting properly and produces an error which is pretty much the same as described here: https://github.com/docker-library/postgres/issues/451
It produces a Bus error and exits with "child process exited with exit code 135". Consequently, all tests are failing because the DB is not online. This is the complete log output of the service:
We are using the standard docker hub postgres image version 9.6 and a few weeks ago this issue was not present in our k8s gitlab runners. The same problem appears with newer postgres versions, I have tested this already. My assumption is that something changed in the configuration of our k8s which causes the postgres initdb to fail. Most likely related to mismatching huge page configurations between k8s and host VMs.
So far, I could not find a way to start the postgres CI service with
huge_pages=off
configured to force postgres NOT to use huge pages at all - not even try to use them.Any ideas are welcome. Our VILLASweb-backend-go pipeline is broken as long as this issue is not solved.
CC @iripiri @stvogel