Closed maxmoehl closed 1 year ago
Hello! Can you try getting into bpm container in the haproxy with bpm shell haproxy
and try ulimit -n
to see what the bpm container has? We were able to do bpm start haproxy
. Do you have a manifest we could use as example to reproduce this?
We're seeing what may be a related problem under pxc-release w/ bpm v1.2.1. This largely affects ubuntu-xenial and ubuntu-bionic stemcells, but less so for ubuntu-jammy.
We have CI jobs that validate pxc-release can handle high connection counts to a managed mysql database server. In CI, we started seeing those jobs fail today under bpm/1.2.1 after several hundred concurrent connections. Jammy also seems to be posing an artificial limit, but jammy's default limit appears to be much higher and not causing failures for us for jobs using that stemcell.
So, in a fresh deploy on either xenial or bionic w/ bpm/1.2.1 we observe a very low nofile limit of 4096
for our "proxy" job that runs under bpm:
# lsb_release -sc
xenial # <= identical on bionic
$ cat /proc/$(pidof proxy)/limits
...
Max open files 4096 4096 files
...
Although in our bpm.yml we have:
...
limits:
open_files: 1048576
...
If we bosh ssh and manually run bpm stop && bpm start
, we see the limit goes away:
# lsb_release -sc
xenial # <= identical on bionic
# bpm stop proxy && bpm start proxy
# cat /proc/$(pidof proxy)/limits
...
Max open files 1048576 1048576 files
...
However if we monit restart
the process we observe the limit resets back to 4096:
# lsb_release -sc
xenial # <= identical on bionic
# monit restart proxy
# ...wait a bit...
# cat /proc/$(pidof proxy)/limits
...
Max open files 4096 4096 files
...
We do see that the bosh monit process has a nofile hard limit of 4096:
# lsb_release -sc
xenial # <= identical on bionic
# cat /proc/$(pidof monit-actual)/limits
...
Max open files 1024 4096 files
...
Under jammy, this limit is higher, but not as high as we configure in our bpm.yml:
# lsb_release -sc
jammy
# cat /proc/$(pidof monit-actual)/limits
...
Max open files 1024 524288 files
...
This limit also applies to our "proxy" job:
# lsb_release -sc
jammy
# cat /proc/$(pidof proxy)/limits
...
Max open files 524288 524288 files
...
Similarly if we bpm stop && bpm start
, we see this limit goes away (i.e. we get the limit specified in our bpm.yml, 1048576
), but under monit restart
the limit seems to be capped by monit's hard limit.
Under any stemcell if we rollback to bpm/1.2.0 this problem appears to go away entirely:
# bpm --version
1.2.0
# cat /proc/$(pidof proxy)/limits
...
Max open files 1048576 1048576 files
...
# monit restart proxy
# cat /proc/$(pidof proxy)/limits
...
Max open files 1048576 1048576 files
...
Offhand, looks like this may be related to https://github.com/golang/go/issues/59064 which was introduced in Go v1.20.4.
Experimentally rolling bpm/1.2.1 back locally to Go v1.20.3 also makes this issue disappear.
@lnguyen I think all the information I could have provided is now already here. If you still want me to re-produce the issue on our system, please let me know and I will do so.
This should be in fixed in bpm/1.2.2 @maxmoehl
We confirmed that the issue is fixed with 1.2.2 in our environments. Thank you for the quick response!
We are using bpm to deploy HAProxy through haproxy-boshrelease.
At first we observed the error:
even though we have set the value appropriately in the bpm config:
After manually trying to start the job for troubleshooting using
bpm start haproxy
we now see another error:Any idea what is happening there or how to fix it? The release seemed to only contain a few version bumps.