Closed gaktive closed 3 years ago
Hi @svollath , for pids_limit
in etc/crio/crio.conf.d/00-default.conf
which value should the docs use as a guideline? Should it be -1
(from your Confluence notes) or 32768
(from your more recent response in Bugzilla 1179109)? Thanks
Not sure if we should modify /crio/crio.conf.d/00-default.conf in CaaSP 4.5 -> I believe we have to adjust ./addons/cri/conf.d/99-custom.conf for skuba based deployments which I believe ends up as /etc/crio/crio.conf.d/99-custom.conf..
For CaaSP 4.2 we might have to adjust crio.conf directly.
Adjusting /usr/lib/systemd/system/crio.service is also not the best way to go - AFAIK custom adjustments have to go to a systemd overlay file in /etc/systemd....
In the first place, current conclusion is to just document adjustments related to pids_limit only.
For CaaSP-4.2.X:
# sed -i -e 's|pids_limit = 1024|pids_limit = 32768|g' /etc/crio/crio.conf
# sudo bash -c \"echo '32768' > /sys/fs/cgroup/pids/kubepods/pids.max\"
... to be answered by the CaaSP teamFor CaaSP-4.5.X:
pids_limit = 32768
For all of these settings, we need to be sure, that changes are enabled (e.g. restart crio), boot and update persistent (e.g. aren't lost, or overwritten by skuba). => to be answered by the CaaSP team.
In addition to that, some systemd tunings made CaaSP (CAP/CATs) more performant and stable, while those are experimental, still. The common tunings for 4.2.X and 4.5.X would be:
"sudo bash -c \"echo 'DefaultTimeoutStartSec=300s' >> /etc/systemd/system.conf\""
"sudo bash -c \"echo 'DefaultTimeoutStopSec=5s' >> /etc/systemd/system.conf\""
"sudo bash -c \"echo 'DefaultStartLimitIntervalSec=1s' >> /etc/systemd/system.conf\""
"sudo bash -c \"echo 'DefaultTasksMax=infinity' >> /etc/systemd/system.conf\""
"sudo bash -c \"echo 'DefaultTasksAccounting=no' >> /etc/systemd/system.conf\""
I woud not suggest to document those for now.
Hey, raising the pids_limit just a bit seems fine, if you can guarantee that the machine does not run out of available process IDs. Generally I just can recommend to keep it as low as possible. We decrease security with raising the PIDs limit, because fork bombs could attack a worker nodes. This applies to the Kubernetes pod and node PIDs limit as well as the CRI-O configuration.
We already increased the default pids_limit multiple times in CRI-O, whereas having multiple thousand PIDs inside of a container seems to be an application configuration or delivery issue. Is it possible to configure the amount of created processes inside of the container so that the workload can scale the Kubernetes’ way?
/cc @rhafer
When I set pids_limit to "max", running CATs, several containers peak at more than 2500 pids - finally it is related to the design of our diego-cell pod, running containers in containers. The diego-cell is expected to start containers, currently limited by available memory and disk space, only (more or less). If it's about calculating "needed" pids, this will depend on the possible number of apps, the code of the apps itself, also it depends on the fixed amount of pids (real hard limit) vs. the fixed sum of pids in usage currently (limited throughput/timeouts) - maybe it's too hard to calculate all of that for a diego-cell.
I see. I'm thinking if we can set the PIDs limit only for that specific pod, but this seems not possible with the current set of features. Maybe raising the limit to 3072 would be enough?
Thanks for your inputs @saschagrunert - are you saying it's not currently possible to limit the pids for a specific pod? The customer doesn't want to increase this across the board (seems rightly so) and If we cannot do it at the pod level would it be advisable for them to run these pods on dedicated nodes with higher pid limits?
Finally, in terms of how we increase this limit in a boot-persistent way, would the kubelet pid limit setting override the cri-o pid limit settings? I was looking at https://github.com/cri-o/cri-o/issues/1921 and there was a suggestion at the end to configure this at the kubelet level for the node.
Thanks for your inputs @saschagrunert - are you saying it's not currently possible to limit the pids for a specific pod? The customer doesn't want to increase this across the board (seems rightly so) and If we cannot do it at the pod level would it be advisable for them to run these pods on dedicated nodes with higher pid limits?
Yes, there is right now no support to assign a PID limit to a specific workload, only to all of them.
Finally, in terms of how we increase this limit in a boot-persistent way, would the kubelet pid limit setting override the cri-o pid limit settings? I was looking at cri-o/cri-o#1921 and there was a suggestion at the end to configure this at the kubelet level for the node.
Technically the PID limit in CRI-O gets passed down to the OCI runtime (runc), which assigns the limit via the TasksMax
systemd option, which maps to the pids.max
cgroup attribute. The kubelet enforces the PID limit directly via the PID cgroup, it does not pass it to the container runtime in any way. I expect that both settings overwrite each other. Therefore I recommend to keep them in sync.
Just for a general understanding - how many PIDs can be used on SLES in general? As we support 110 PODs per worker - 32k PIDs would allow running 3.604.480 PIDs within containers plus the PIDs on the OS.. Would that be a problem?
Just for a general understanding - how many PIDs can be used on SLES in general?
Most Linux systems have the maximum PID set to 32768
As we support 110 PODs per worker - 32k PIDs would allow running 3.604.480 PIDs within containers plus the PIDs on the OS.. Would that be a problem?
We have to take other processes into account which are required to run a Kubernetes node plus leaving some room for possible other applications user might run.
Just for a general understanding - how many PIDs can be used on SLES in general?
Most Linux systems have the maximum PID set to 32768
It's possible to increase that limit via the kernel.pid_max
sysctl knob. Though I am not sure on what the maximum value we support on SLES is.
At this time, would it be safe to say that setting the pids_limit and cgroups to 3072 is something CAP customers can be advised on? There would be caveats indicating that based on the number of apps deployed, that number can go higher but setting it to the maximum would lead to a potential vulnerability.
Otherwise, is there a way to set this on CaaSP so that it's persistent upon node or other restart?
Regarding CaaSP v4.2 crio configuration level, we will prepare fix for crio.conf which will include higher pids_limit
.
@gaktive In my opinion comment https://github.com/SUSE/doc-cap/issues/1031#issuecomment-734385508 explain this change for CaaSP quite good. We will prepare also CaaSP update which will persist this change.
Thanks @mjura, especially for the feature request for the persistent change.
@btat we should have enough to write up something here in the docs. Let me know if you need help with wording.
We need to ensure CaaSP 4.x has these settings in place for a more stable CAP:
The setting that affects nested containers in /etc/crio/crio.conf:
Maximum number of processes allowed in a container.
pids_limit = 1024
After raising that number, on a running system, you will see containers with more than 1024, which is expected with CAP. For running CATs on a single diego-cell,
pids_limit=1024
is not enough. Due to the nature of nested containers, this might also be true for production systems running Diego. At least this should be documented for "Running CAP on CaaSP".Additional limits need to be set for crio in "/usr/lib/systemd/system/crio.service" and raised those, too:
LimitNOFILE
LimitNPROC