DUNE-DAQ / drunc

Dune RUN Control (DRUNC) is the run control for the DUNE experiment
1 stars 1 forks source link

`kill` should also `flush` dead processes #60

Closed plasorak closed 6 months ago

plasorak commented 6 months ago

in the ProcessManager, and if the ProcessQuery match the dead processes.

plasorak commented 6 months ago

Reopening this issue as kill now crashes with

drunc-unified-shell > kill --session test-session
[13:02:44] INFO     "process_manager": Sending signal 'Signals.SIGINT' to '0ac2408e-e324-446a-9184-2955c00b99f7'                                                                                                    ssh_process_manager.py:355
[13:02:45] INFO     "process_manager": Sending signal 'Signals.SIGKILL' to '0ac2408e-e324-446a-9184-2955c00b99f7'                                                                                                   ssh_process_manager.py:355
           INFO     "ssh-process-manager": Process 'hsi-controller' (session: 'test-session', user: 'plasorak') process exited with exit code -9                                                                    ssh_process_manager.py:136
           INFO     "Broadcast": Process 'hsi-controller' (session: 'test-session', user: 'plasorak') process exited with exit code -9                                                                                  broadcast_sender.py:66
[13:02:45] INFO     "Broadcast": 'SUBPROCESS_STATUS_UPDATE' Process 'hsi-controller' (session: 'test-session', user: 'plasorak') process exited with exit code -9                                         kafka_stdout_broadcast_handler.py:83
[13:02:46] INFO     "process_manager": Sending signal 'Signals.SIGQUIT' to '0ac2408e-e324-446a-9184-2955c00b99f7'                                                                                                   ssh_process_manager.py:355
[13:02:46] ERROR    "process_manager_driver": Command 'kill' failed on 'process_manager' (response flag 'UNHANDLED_EXCEPTION_THROWN')                                                                                       shell_utils.py:164
           ERROR    "process_manager_driver": Stacktrace on remote server!                                                                                                                                                  shell_utils.py:175
                    [Errno 3] No such process

AttributeError: values
           WARNING  "click_shell.core": Traceback (most recent call last):                                                                                                                                                          core.py:50
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/.venv/lib/python3.10/site-packages/click_shell/core.py", line 34, in invoke_
                        command.main(args=shlex.split(arg),
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/.venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
                        rv = self.invoke(ctx)
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/.venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
                        return ctx.invoke(self.callback, **ctx.params)
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/.venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
                        return __callback(*args, **kwargs)
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/.venv/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
                        return f(get_current_context(), *args, **kwargs)
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/drunc/src/drunc/process_manager/utils.py", line 27, in new_func
                        return ctx.invoke(f, query=query,**kwargs)
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/.venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
                        return __callback(*args, **kwargs)
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/.venv/lib/python3.10/site-packages/click/decorators.py", line 38, in new_func
                        return f(get_current_context().obj, *args, **kwargs)
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/drunc/src/drunc/utils/utils.py", line 116, in wrapper
                        ret = loop.run_until_complete(main_task)
                      File
                    "/cvmfs/dunedaq.opensciencegrid.org/spack/externals/ext-v2.1/spack-0.20.0/opt/spack/linux-almalinux9-x86_64/gcc-12.1.0/python-3.10.4-avttyjcqg3mct6r252fx5mxmff5ddf6z/lib/python3.10/asyncio/base_events.py",
                    line 646, in run_until_complete
                        return future.result()
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/drunc/src/drunc/process_manager/interface/commands.py", line 62, in kill
                        obj.print(tabulate_process_instance_list(result.data, 'Killed process', False))
                      File "/nfs/home/plasorak/NFD24-05-20/swdir/drunc/src/drunc/process_manager/utils.py", line 46, in tabulate_process_instance_list
                        for result in pil.values:
                    AttributeError: values
TiagoTAlves commented 6 months ago

Any idea why this is happening?