Closed ricab closed 2 months ago
OK I am declaring this ready for review, but I would like to hear opinions on something.
As I have it, an SSHSession
locks when creating an SSHProcess
and passes the lock to the process. The process only releases the lock when someone asks for the exit_code()
or when it is destroyed (or if a friend [SftpServer
] asks to release the channel).
If someone asks for the exit code again, and if we got it the first time, the known code is returned. However, if we timeout obtaining the exit code the first time, the lock is released and the session can be used by other threads, even though the exit code can be requested again. At that point, the session is no longer thread safe.
I could instead hold the lock until the exit code is successfully obtained and provide a knob for client code to release the lock manually on timeout. But if the client failed to do this, the program would deadlock when trying to destroy the session. I could also invalidate the ssh process and refuse to look for the exit code again, but then we couldn't recover it. What do you think is the best approach for this?
Hey @ricab!
Looks like you're going to need a private side PR: https://github.com/canonical/multipass-private/actions/runs/8543123264/job/23406233450#step:22:627
Ah shoot... OK, I'll try to do that soon.
Hey @ricab!
I've given your question some thought here and IMO, we shouldn't really allow calling exit_status()
again after a timeout. A timeout to me means that the SSHProcess
is no longer valid and the whole execution should be retried in a new object. And really, I think once exit_status()
is called, that should be it regardless if it returned a value or timed out.
I think if the SSHProcess
object is still around after calling exit_status()
and tries to call exit_status()
again, exit_status()
should check if it no longer owns the lock and if not, should throw an exception.
We can talk more about about this if this isn't really making sense or if you disagree :slightly_smiling_face:
Hey @townsend2010, thanks for the feedback. OK, I think that equates to what I had in mind with my last option ([...] invalidate the ssh process and refuse to look for the exit code again [...]). The principle makes sense to me, but I would still like to discuss the details and I need to see how to fold that with the SFTPServer
(I think it asks for the exit_code
twice today).
Hey @townsend2010, I did something along the lines we discussed, although instead of "invalidating" subsequent exit_code
calls I just return the result from the first call, be that an actual exit code or one of two new Exitless exceptions. This is more in line with the principle of not discarding results of computations. It avoids yet another exception while preventing any attempts to actually fetch the exit code again. The session is unlocked by then and SSH isn't used again. The new "peeking" method also saves the exit code if it gets one (but not exceptions).
There is one commit that I am quite unhappy about: https://github.com/canonical/multipass/pull/3457/commits/becb2c58a635dcc6ed95140497c6f1b0dea6db8c. Doing better would require rewriting many sftp/sshfs tests. That would deviate too much from what we're after here. Unless you find an easier way to adapt? I found that ExitStateMock
does not have "enough resolution", and mocking MP_UTILS.run_in_ssh_session
on top of the exinting premocks doesn't work without reworking what is there, let alone mocking SSHProcess
/SSHSession
directly.
Let me know what you think.
Thanks for the review @townsend2010!
I'd like to test this with the latest main and built as a snap, so could you please rebase this on main and then I will test? Thanks!
I am trying, but having trouble with Flutter at the moment (@andrei-toterman build instructions are sorely missed :wink: )
In the meantime, doesn't the snap build on a main merge?
In the meantime, doesn't the snap build on a main merge?
Yeah, it is, so it's probably ok.
Alright, clean rebase @townsend2010 :point_up_2:
Ok, cool, thanks @ricab!
You're welcome!
Right, need to merge manually...
Codecov Report
Attention: Patch coverage is
93.33333%
with5 lines
in your changes are missing coverage. Please review.Additional details and impacted files
```diff @@ Coverage Diff @@ ## main #3457 +/- ## ========================================== + Coverage 88.83% 88.85% +0.02% ========================================== Files 253 253 Lines 14049 14098 +49 ========================================== + Hits 12480 12527 +47 - Misses 1569 1571 +2 ```:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.