Closed Apteryks closed 2 years ago
This is what we see when stracing Guile on the client side (guile-ssh):
[pid 4311] poll([{fd=19, events=POLLIN}], 1, 15000) = 0 (Timeout)
[pid 4311] write(1, "$8 = #<eof>\n", 12) = 12
The timeout is set to 15 s; it polls for that long then returns EOF.
Dunno if it's a bug or by design, but IIUC, libssh ssh_channel_poll will return 0 (the length of the stdbuffer) when a timeout elapses during polling (and not SSH_AGAIN, as one might expect).
This condition doesn't seem to be expected in guile-ssh read_from_channel_port (it expects either SSH_ERROR, SSH_EOF or a positive value).
So what happens on a timeout is that ssh_channel_read
returns 0 (the same as when it encounters EOF). So probably guile-ssh just treats it as an EOF, since it doesn't have more information to work with.
Yep, seems to be Guile's peek_byte_or_eof
that chooses to return EOF when nothing was available/read.
Hello,
what version of GNU Guile do you use?
Hello! It should be Guile 3.0.7, I think (the one used by Guix).
Thanks!
Hello again,
could you please check this branch https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-nonblocking-eof and see what will happen?
Besides, you can build Guile-SSH for debugging like follows:
CFLAGS=-DDEBUG make -e -j4
I added extra debug traces in the channels code.
I've updated the guile-ssh package locally to build from commit 2e25d852104f375936e81d9d7163892c6e828e68 and ran:
$ ./pre-inst-env guix offload test /etc/guix/machines.scm tm
guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
Backtrace:
In ice-9/boot-9.scm:
1752:10 11 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In unknown file:
10 (apply-smob/0 #<thunk 7f1b1ad29f60>)
In ice-9/boot-9.scm:
724:2 9 (call-with-prompt _ _ #<procedure default-prompt-handler (k proc)>)
In ice-9/eval.scm:
619:8 8 (_ #(#(#<directory (guile-user) 7f1b1ad23c80>)))
In guix/ui.scm:
2205:7 7 (run-guix . _)
2168:10 6 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1752:10 5 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/scripts/offload.scm:
704:21 4 (check-machine-availability _ _)
In srfi/srfi-1.scm:
586:17 3 (map1 (#<session root@10.42.0.243:22 (connected) 7f1b162acfc0>))
In guix/inferior.scm:
259:2 2 (port->inferior _ _)
241:2 1 (read-repl-response _ _)
In ice-9/boot-9.scm:
1685:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `match-error' with args `("match" "no matching pattern" #<eof>)'.
So immediately I don't see a change; but I also don't see debug traces so I'm not sure I'm testing it correctly (I've specified CFLAGS=-DDEBUG
as a configure flag).
@Apteryks May you share your Guile code which you used for test and your package specification. I can't reproduce it localy with guile-ssh 0.13.1
uix describe
Generation 173 Nov 12 2021 21:01:27 (current)
guix da73727
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: da73727f1a1c49bd0b834d2d4da48d578062b0ae
I cannot reproduce this anymore with the same setup but with a newer Guix that uses the recently released guile-ssh 0.15.1: even though the low spec server is busy doing something, the client (guile-ssh 0.15.1) waits for it without reporting EOF, it seems.
I guess it can be closed :-).
Thank you!
Hello,
I've been trying to understand a problem in Guix where reading from a SSH channel returns EOF. My debugging led me to find that it occurs when there's nothing to read on the channel passed the channel specified timeout value.
Normally the underlying libssh ssh_channel_read returns 0 when there's nothing to read or an error if there was an error, not EOF, IIUC. Is this expected behavior? If so, how can someone discriminate a timeout from a true EOF?
Thank you,
Maxim