Closed maxamel closed 1 year ago
Example process info:
{'memory_percent': 0.0, 'gids': pgids(real=990, effective=990, saved=990), 'cpu_num': 1, 'pid': 85, 'num_ctx_switches': pctxsw(voluntary=5, involuntary=0), 'cpu_times': pcputimes(user=0.0, system=0.0, children_user=0.0, children_system=0.0, iowait=0.0), 'environ': None, 'exe': '', 'cpu_affinity': [0, 1, 2, 3, 4, 5], 'cwd': None, 'num_fds': None, 'ionice': pionice(ioclass=<IOPriority.IOPRIO_CLASS_NONE: 0>, value=4), 'open_files': None, 'threads': [pthread(id=85, user_time=0.0, system_time=0.0)], 'username': 'sbx_user1051', 'io_counters': None, 'cpu_percent': 0.0, 'terminal': None, 'nice': 0, 'name': 'ssh', 'memory_info': pmem(rss=0, vms=0, shared=0, text=0, lib=0, data=0, dirty=0), 'memory_maps': [], 'connections': None, 'cmdline': [], 'num_threads': 1, 'memory_full_info': pfullmem(rss=0, vms=0, shared=0, text=0, lib=0, data=0, dirty=0, uss=0, pss=0, swap=0), 'status': 'zombie', 'create_time': 1693825223.25, 'uids': puids(real=993, effective=993, saved=993), 'ppid': 1}
Note the parent is init, which means this process is orphaned and the real parent terminated.
You could try it with a plain git command via gitpython.Git.clone()
(actually, I don't know how to instantiate a git-command by hand but I am sure you will find out) to see if that resolves the problem. If so, it's clear that it's something about the way GitPython calls git
from Repo.clone_from()
so maybe there is a way to fix it. If the issue persists, it's something in cmd.py
and maybe there are other ways to launch the git
command to avoid this issue.
In theory, assuming python
won't interfere with children of its own child processes, this would be a git
issue as well as it's the git
program that spawn ssh
sessions.
Hello @Byron , I verified and it is indeed a git issue. I ran this without the python code and I can still see the defunct ssh processes left on the system.
Thanks. It looks like there is nothing that could be done here as it's a problem with git
itself.
Hello, We have a list of repositories we're running over and performing
Repo.clone_from
on each one. I can see the underlying command created is:git clone -v --depth=1 -b 3.23.66 -- ssh://*****@*****lab-prod.server.sim.cloud/terraform/modules/aws-eks /tmp/dest
The operations fail with
Could not resolve hostname
(which is OK), but the problem is that the exit is not clean and leaves ssh processes dangling. From investigating it looks like they are zombie processes with no cmdLine or any other useful info. These processes quickly pile up and cause us to hit system limits. Please assist with understanding the issue.Any advice will be much appreciated.