Closed Kuba314 closed 9 months ago
@olichtne The cleanup()
method was on JobContext
, which is only used in RemoteMethods
and cannot actually be called via RPC, or have I misunderstood?
Btw, https://github.com/LNST-project/lnst/blob/master/lnst/Agent/Agent.py#L1059 get_cmd
and del_cmd
don't actually exist anymore, so the whole "exception"
handler can be removed? Which is weird because we are probably sometimes sending "exception"
messages at:
@olichtne The cleanup() method was on JobContext, which is only used in RemoteMethods and cannot actually be called via RPC, or have I misunderstood?
ah, you're right, i misread that...
Btw, https://github.com/LNST-project/lnst/blob/master/lnst/Agent/Agent.py#L1059 get_cmd and del_cmd don't actually exist anymore, so the whole "exception" handler can be removed? Which is weird because we are probably sometimes sending "exception" messages at:
so there are 2 exception type messages:
send_data_to_ctl
method, this sends a message from an Agent process directly to the Controller processSo I think this should be kept as this has a different, still existing use case.
However, I'm not sure why the get_cmd
and del_cmd
method calls are still there and why we haven't seen any problems with them recently.
This should be testable by runing a test module linke this:
class MyTestModule(BaseTestModule):
def run():
sleep(2) # simulate runtime
raise Exception("exception from Job process")
@olichtne I'm seeing the following in host1's DEBUG output:
Traceback (most recent call last):
File "/lnst/lnst/Agent/Job.py", line 263, in run
self._result["passed"] = self._what["module"].run()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/lnst-tHnIm-F_-py3.12/bin/cache/1e90595123b49a02d0c371edd1d0e9c9d778da4db94c6657ddc1c4484b1e255e", line 12, in run
raise Exception("exception from Job process")
Exception: exception from Job process
This is where the test module gets run: https://github.com/LNST-project/lnst/blob/master/lnst/Agent/Job.py#L262. Any exception gets caught right here and its traceback is printed to agent's debug. There's however another try-catch which I don't think will ever trigger at https://github.com/LNST-project/lnst/blob/master/lnst/Agent/Job.py#L110.
got it. makes sense... i'd still keep these except branch blocks and the exception message handling in the Agent for now. It may look useless right now but i'm not 100% sure and i think it at least doesn't have to be part of this PR.
@Kuba314 anything else that you want to do for this PR since you're keeping it in Draft status? or can we proceed with merging this in?
Oops, no, sorry. It was just a proposed fix at the time of creation but now it's tested and approved by you so I'm marking it ready.
Description
See commit message bodies for more info.
Tests
Normal SimpleNetworkRecipe execution passes, interrupting with ^C during iperf test also finishes well and doesn't crash agents. Not sure if this is enough testing though...
Closes: #354