Closed danielmatz closed 2 years ago
That error means that the Julia executable binary has not been found on the remote host. To use the remote REPL, you need to have Julia installed remotely, and whatever shell you run when you ssh to the remote host has to be able to find julia-snail-executable
. The default should work if julia
is on the remote host’s login shell’s PATH
. So when you say that which julia
works, do you mean it works on the local or the remote host?
Note that you can set julia-snail-executable
per-project using .dir-locals.el
, so it’s not necessarily a global setting.
Setting julia-snail-executable
to the full path to my remote julia
worked! I'm not sure why it wasn't finding it on the PATH
. When I was talking about which julia
, that was indeed on the remote host.
Maybe it's because of the distinction between interactive, non-interactive, login, and non-login shells? See this discussion — https://unix.stackexchange.com/questions/38175/difference-between-login-shell-and-non-login-shell — and I was being sloppy upthread when I said it has to be on the login shell's path.
Snail just opens an ssh tunnel to the remote host. I'm not 100% sure what kind of shell that spins up (login? interactive?), but it probably does not run the same startup file that your normal interactive login shell does. As a quick test, you can try doing something like ssh myhost 'echo $PATH'
or ssh myhost 'which julia'
and see if that gives a reasonable output.
Yes, I think you are right. Doing ssh myhost 'which julia'
can't find julia
.
Should Snail honor tramp-remote-path
?
I looked at the manual entry for tramp-remote-path
(https://www.gnu.org/software/emacs/manual/html_node/tramp/Remote-programs.html) and it gave me the impression that it's just there to help Tramp itself find some basic utilities it needs to operate (like ls
). Since Snail works at a higher level, it doesn't seem to me like the right thing. But I'm willing to change my mind. Do you use tramp-remote-path
for something like this?
Well, I'll be... I guess I've always misunderstood that variable. I was thinking it was the reason that my shell-command
example where I ran which julia
on the remote host worked. But I think you were on the right track to begin with. TRAMP is starting up a shell (but with a different combination of interactive and login), and the shell's own configuration is what is allowing it to find the julia
executable... Sorry for pointing you in the wrong direction.
I was just trying to think of a way to allow julia-snail-executable
to be set to "julia"
and have it honor the remote host's PATH
.
You've been very patient and kind. I hope you don't mind one last attempt on my part.
Right now, you use the same ssh
call to establish the tunnel and to launch Julia. Could you separate those? If you first establish the tunnel, and then use start-file-process
(or similar) to launch Julia, then you'll essentially be deferring to how TRAMP handles remote processes. So, if someone has already configured their remote system and TRAMP to be able to find julia
, it should "just work."
Setting Snail aside for a moment, let me clarify something: you configured Tramp to launch a Julia REPL? If so, could you help me understand your workflow (which I assume from your previous comments you used before Snail added remote REPL support)?
How is PATH set on your remote system, and which shell do you use? Is it possible that your Julia binary is being set in a file that doesn't get loaded by a non-interactive shell (which seems to be what ssh host 'which julia'
runs)? I just did a bit of research and testing on this, and the tldr is:
ssh -t
which supposedly allocates a tty and makes it a login shell. 🤷🏻♂️.zshenv
is always executed; .zshrc
is only executed by interactive shells..bashrc
is only executed by non-interactive shells; .bash_profile
is only executed by interactive shells. To run your setup regardless of shell type, you're supposed to put everything in .bashrc
and source .bashrc
from .bash_profile
.There is more complexity if you have global configurations in /etc
, which I can imagine being relevant if you run Julia on a cluster which does things like put binaries in places like /opt/julia/1/6/2/bin
and configures PATH
in /etc/profile
.
I'm stressing this because it seems to me that you expect both interactive and non-interactive shells to have julia
set on the PATH
, but have not actually set up your remote environment to do so.
PS: Once we get to the bottom of this, I will update the documentation to clarify all this complexity.
My old workflow for remote machines was to use M-x compile
to run julia
in more of a scripting mode. TRAMP seems to use /bin/sh
by default, so I have my remote .profile
configured to set up my PATH
, and things just work. If I needed a full REPL on the remote machine, I would just open a shell with vterm
. And in that case I'd get a Bash shell, which I also configured to set up PATH
properly.
So, Snail's ssh
command should be getting a Bash shell when it connects. My .bash_profile
has the PATH
configured. Based on your comments, I moved that PATH
manipulation into my .bashrc
and made sure my .bash_profile
was sourcing my .bashrc
. I still got the same error.
Our cluster does indeed manipulate PATH
in /etc
. We have an environment module system. I think you are right that that is the root problem here. That is, ssh myhost "which julia"
doesn't work. I have to wrap the command in another invocation to bash
and force it to load the settings, something like ssh myhost 'bash -l -c "which julia"'
. I'm definitely not asking Snail to do something like that.
I'm sorry that this devolved into you debugging my shell configuration... In the end, I think my best option is where this all started, with me setting julia-snail-executable
. I'll probably use the new connection-local variable feature.
This is pretty interesting. I expect setups like yours to come up somewhat frequently, and want to add guidance to the documentation.
I just triple-checked, and bash definitely reads .bashrc
, and the Snail ssh connection works in my test environment when there's something odd about the location of the Julia binary but where a special PATH
entry exists in .bashrc
.
I did find a bug which occurs if you have a different default username configured for your remote host in .ssh/config
from the one used in the Tramp connection string (e.g., if .ssh/config
says myhost
should use default username myname1
but your Tramp connection string was /ssh:myname2@myhost:
). If that's the case in your setup, then it would explain why the .bashrc
change didn't work for you. Fixed in 5b9d95f. It's not in MELPA yet (CI will grab it in the next couple of hours), but since it looks like you use straight.el, you can pull the change right away.
Assuming that doesn't fix your problem, the next thing you should check is that ssh myhost 'echo $SHELL'
is actually bash. That you have to dance with 'bash -l -c "which julia"'
suggests that something unusual is going on (maybe your cluster has another shell as your login default shell which then does exec bash
at the end of its own configuration file). I would also look at ssh myhost 'echo $PATH'
for clues about what your cluster is doing when you log in.
Sorry, I don't think I explained myself clearly. I think the core issue is that when you run a command using ssh
, the shell it starts up is not interactive and is not a login shell, and so /etc/profile
is not sourced. That means the environment module system that we use on our lab never gets set up. That means that my PATH
doesn't get configured properly by my .bashrc
, which is indeed being run. In fact, it prints out an error, because the environment module commands fail.
The point of wrapping the command in bash -l
was to force bash to start up in such a way that it sources /etc/profile
. See this excerpt from the bash
man page:
When bash is invoked as an interactive login shell, or as a non-interactive shell with the --login option, it first reads and executes commands from the file /etc/profile, if that file exists.
Well, as long as setting julia-snail-executable
in a .dir-locals.el
or .dir-locals-2.el
works for you, then I'm happy. But I had another idea, which, if it works, is worth documenting as a workaround.
What if, in your .bashrc
file, you put something like this:
if [[ ! $(shopt -q login_shell) && $- != *i* && -f /etc/profile ]]; then
. /etc/profile
fi
This sources /etc/profile
if you're in a non-interactive non-login shell and if /etc/profile
exists. Or, as a one-liner:
[[ ! $(shopt -q login_shell) && $- != *i* && -f /etc/profile ]] && . /etc/profile
I couldn't quite test it, because on my test environment /etc/profile
gets sourced by non-login non-interactive shells (if I run ssh bashuser@myhost 'echo $__ETC_PROFILE_SOURCED'
it prints 1
, which is set in /etc/profile
— no clue what's going on, since this behavior contradicts the bash man page; maybe a version difference).
PS: I can only shake my head at POSIX shells in general and bash in particular. The check for an interactive shell is especially a thing of beauty.
Wow! That snippet is wild, but it works! Snail can now find my remote Julia installation.
Unfortunately, I now encounter a new error:
julia> JuliaSnail.start(10011); # please wait, time-to-first-plot...
ERROR: IOError: listen: address already in use (EADDRINUSE)
I've tried changing julia-snail-port
to several different values, and I always get the same error. I can use lsof -i :10011
to see the SSH process is indeed listening to that port.
That error means port 10011 is in use on the remote host. Kill Snail and all tramp sessions. ssh into the remote host, and run ps auwwwx | grep -i julia
and see if there's a stray Julia process hanging out?
When you say that you tried changing julia-snail-port
to different values but get the same error, does that mean the port in the JuliaSnail.start
call is always 10011, or does it change to match julia-snail-port
?
Hmm, I've actually been assuming that your cluster runs a recent Linux and OpenSSH combination. Can you please confirm that? uname -a
and ssh -V
are a good start. If Linux, what distribution?
That error means port 10011 is in use on the remote host. Kill Snail and all tramp sessions. ssh into the remote host, and run ps auwwwx | grep -i julia and see if there's a stray Julia process hanging out?
I restarted Emacs entirely, checked for stray Julia processes, and still got the same error.
When you say that you tried changing julia-snail-port to different values but get the same error, does that mean the port in the JuliaSnail.start call is always 10011, or does it change to match julia-snail-port?
Yes, sorry, the JuliaSnail.start
command does always reflect the port I set. I also tried playing around with julia-snail-remote-port
, which gets me past that Julia error, but then I get an Emacs message that it failed to connect to the Snail server.
Hmm, I've actually been assuming that your cluster runs a recent Linux and OpenSSH combination. Can you please confirm that? uname -a and ssh -V are a good start. If Linux, what distribution?
I believe we use CentOS.
uname -a
:
Linux myhost 3.10.0-1160.25.1.el7.x86_64 #1 SMP Wed Apr 28 21:49:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
ssh -V
:
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017
Did a remote REPL ever work for you? It sounded from https://github.com/gcv/julia-snail/issues/54#issuecomment-894874309 like it did, but now it’s sounding like it never works.
What if you set both julia-snail-port
and julia-snail-remote-port
to the same value, but not 10011 (like 21138)?
Did a remote REPL ever work for you? It sounded from #54 (comment) like it did, but now it’s sounding like it never works.
No, it never did work for me. I was just doing M-x compile
and running Julia like a scripting language on remote hosts before.
What if you set both
julia-snail-port
andjulia-snail-remote-port
to the same value, but not 10011 (like 21138)?
I tried that, but no luck.
While digging for reasons this problem occurs, I found another potential bug which affects Snail working with some (newer?) versions of emacs-libvterm and which may cause ssh
invocations to misfire. It's now fixed in the latest MELPA build, so please update the julia-snail
package, restart Emacs, and see if you still have the problem.
If that doesn't work, let's get to some real debugging. We need to get Emacs out of the picture and understand what happens with your ssh tunnel. You will use netcat (nc
) to send commands from the local machine (client) to the Snail server on the cluster. (You may have to install netcat from your package manager.)
julia
instances on the remote host.ssh
connections from your local machine to the remote host.JuliaSnail.jl
, Project.toml
, and Manifest.toml
from the Snail package (or directly from GitHub) to some directory on your remote host.ssh -t -L 10069:localhost:10099 remotehost /path/to/julia/binary -L /path/to/JuliaSnail.jl/on/remote/host
JuliaSnail.start(10099)
and wait for the prompt.echo '(reqid = "abcd1234", ns = [:Main], code = "println(\"hello world\")")' | nc localhost 10069
This should print hello world
to your Julia REPL and error out with a message like IOError: stream is closed or unusable
.
If the Snail server fails to start on the remote host with the EADDRINUSE
error, then replace 10099 with a different port number in both the ssh tunnel call and the JuliaSnail.start
call. If that still does not work, something prevents you from opening up server sockets on the remote host, and you should ask your system administration staff for an explanation.
If the Snail server starts but there is no output in the Julia REPL from the netcat call, your tunnel is not being set up correctly. Maybe it's your local machine, maybe it's a firewall, maybe it's configuration, and maybe it's something else. Since I have no way to reproduce this situation, I cannot help any further.
If everything works without Emacs, but does not work inside Emacs, then maybe your Snail installation is broken. Blow it away completely (delete from disk), reinstall from MELPA or GitHub, and restart Emacs.
Woohoo! I pulled your latest bug fix with straight and it works! Thank you again for your help tracking down my issues. And thank you again for Snail!
This issue has returned for me. I get the following output when I try to start snail remotely:
Starting Julia process and loading Snail...
if: The vterm buffer is inactive; double-check julia-snail-executable path
Creating a .dir-locals.el
file to set julia-snail-executable
doesn't help.
I tried to follow your debug steps above, and I can get through to step 6. When I run the echo
command, I get this back locally:
(julia-snail--response-success "abcd1234" nil)
But the remote process prints the following out in an infinite loop:
JuliaSnail: something broke: type Nothing has no field redid
Are there any changes to the debug steps I can try? Thanks!
That looks like a broken Snail installation on the remote host. You're absolutely certain the error says no field redid
? Not reqid
?
You are right; it says reqid
. I think autocorrect got me... sorry about that.
I reproduced the problem. 👀
Something changed in the network IO code of Julia 1.8. While I figure out WTF broke between Julia versions and adapt Snail to deal with it, could you please try 1.7.x and see if that works for you?
I was able to test with 1.7.2 on my remote system, and it does indeed work.
Opening a separate ticket to track the new problem: https://github.com/gcv/julia-snail/issues/120
I was excited to hear about the new remote REPL capability. I was just giving it a try, but it fails to start the REPL. Here's what I see in my
*Messages*
buffer:My
julia-snail-executable
variable is still set to the default ofjulia
. If I launchvterm
manually, I can dowhich julia
successfully. I can also useshell-command
to runwhich julia
, and that works.I saw you had a
julia-snail-debug
variable, but when I set it tot
, I didn't get any additional output.Any thoughts on what is going on?
Thanks in advance for the help! And thanks for Snail!