DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
114 stars 176 forks source link

Something wrong with DIRAC processes #547

Closed msapunov closed 12 years ago

msapunov commented 12 years ago

Part of an output of ps command at machine on which i have freshly installed pre10:

S    3631  \_ runsv WorkloadManagement_TaskQueueDirectorFormation
S    3633  |   \_ svlogd .
SNl 29385  |   \_ python /opt/dirac/pro/DIRAC/Core/scripts/dirac-agent.py Workl
ZN  29413  |       \_ [ldconfig] <defunct>
S    3632  \_ runsv WorkloadManagement_TaskQueueDirectorFrancegrilles
S    3634  |   \_ svlogd .
SNl 29392  |   \_ python /opt/dirac/pro/DIRAC/Core/scripts/dirac-agent.py Workl
ZN  29417  |       \_ [ldconfig] <defunct>
S    3636  \_ runsv WorkloadManagement_TaskQueueDirectorCPPM
S    3638  |   \_ svlogd .
SNl 30466  |   \_ python /opt/dirac/pro/DIRAC/Core/scripts/dirac-agent.py Workl
ZN  30486  |       \_ [ldconfig] <defunct>
S   25371  \_ runsv WorkloadManagement_SandboxStore
S   25372  |   \_ svlogd .
Sl  27647  |   \_ python /opt/dirac/pro/DIRAC/Core/scripts/dirac-service.py Wor
Z   28085  |       \_ [ldconfig] <defunct>
S    14188  \_ runsv DataManagement_StorageElement
S    14189      \_ svlogd .
Sl   14190      \_ python /opt/dirac/pro/DIRAC/Core/scripts/dirac-service.py Dat
Z    14209          \_ [ldconfig] <defunct>

Note, that each any every DIRAC process has a zombie child. So far i have 34 zombie processes on the machine and all of them were spawned by DIRAC processes. This is a common pattern, we have the same problem at volhcb17 or at DIRAC instance in Lyon. Could this problem triggered by recent changes in DISET protocol?

hamar commented 12 years ago

I have also the same problem when I am running EELA pilots by hand:

vanessa 25022 23478 0 06:19 pts/7 00:00:01 python ./eela-dirac-pilot.py -O /O=GRID-FR/C=FR/O=CNRS/OU=CPPM/CN=Vanessa Hamar -M 5 -d -S Dirac-Production -C dips://dirac.eela.if.ufrj.br:9135/Configuration/Server -r v6r1p4 -T 360 -g 2011-06-06 -G dirac_user vanessa 25622 25022 0 06:27 pts/7 00:00:00 [ldconfig]

graciani commented 12 years ago

why do you say it is a problem?

we have no idea where does it come from, but it is not a problem. When the parent dies, the process is finished.

msapunov commented 12 years ago

Ok Ricardo, you are right, it might be not a problem itself but it might be a sign of the problem. Anyways, i just think that zombie process spawned by every DIRAC process needs to be investigated.

graciani commented 12 years ago

it has been investigated, but so far we have not found the reason.

msapunov commented 12 years ago

So, shouldn't this issue be re-opened and assigned to someone, say Adria or Andrei or maybe you?

graciani commented 12 years ago

Have found the reason and will be closed when the pull request is integrated.

atsareg commented 12 years ago

Fixed in #730, v6r4p2