Closed jdoss closed 2 years ago
Have you tried removing the nomad client and flood the socket without it?
@jwhonce any thoughts?
Hey @baude! Thanks for taking a look.
Have you tried removing the nomad client and flood the socket without it?
No, I haven't tried that. I am trying to think about how I would go about getting the same conditions without the nomad client running. The driver uses the socket to stream logs for each container so I think there are a lot of things going on that build up to the socket getting overloaded.
is it possible to exactly reproduce what you are doing? otherwise, this is a lot to ask
If it just the log endpoint it is tracked here: https://github.com/containers/podman/issues/14879
is it possible to exactly reproduce what you are doing? otherwise, this is a lot to ask
@baude Not without launching your own Nomad cluster and loading up each client node 200+ containers each. I understand it's a lot to ask and I am willing to do whatever I can on my end to provide more information.
If it just the log endpoint it is tracked here: #14879
@Luap99 Yeah the Driver does track the log endpoint. Here is where I believe it is doing that:
https://github.com/hashicorp/nomad-driver-podman/blob/main/api/container_logs.go#L16
It looks like I can disable log collection in the Nomad Podman driver.
plugin "nomad-driver-podman" {
config {
socket_path = "unix://var/run/podman/podman.sock"
disable_log_collection = false
volumes {
enabled = true
selinuxlabel = "z"
}
}
}
I am going to test that out on my client nodes and see if I have better performance when deploying a lot of containers at once.
A friendly reminder that this issue had no activity for 30 days.
Since we have heard nothing back in a month. I am guessing that the issue is resolved. Reopen if I am mistaken.
I am still seeing issues but I haven't been able to dig into it more. I will respond back once I have more info.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
This is kind of a cross post issue to see if there is anything that can be done to improve the performance of the Podman socket under high concurrency.
I opened this issue https://github.com/hashicorp/nomad-driver-podman/issues/175 on the Nomad Podman driver project to see if we can track down why Podman on my nomad client nodes becomes overwhelmed and unresponsive under high concurrency. This seems to be a common issue for other users of the Nomad Podman driver.
Is there anything that can be done to help improve the performance of the Podman socket? Are there any tips from the Podman team on how to better debug this issue to get more information?
Steps to reproduce the issue:
Launch hundreds of containers per client node with Nomad
Watch the podman socket become unavailable and my Nomad job allocations start failing
Additional information you deem important (e.g. issue happens only occasionally):
Podman is being run as root on these client nodes on Fedora CoreOS 36.20220618.3.1 on Google Compute VMs.
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):