Closed clement-chaneching closed 5 months ago
Hi, @clement-chaneching. Sorry that you've run into a problem with this!
Coiled generates (and uses) a unique SSH keypair for each cluster, so you shouldn't need to worry about your own SSH keys. We do sometimes have some issues with mutagen, though, and I'd like to know if you're running into a problem there or a different problem. Would you mind trying a coiled run
command like coiled run echo hello
and see if that works? It also uses SSH, so that should help narrow down the problem.
Hello,
Thanks for the quick reply @ntabris !
coiled run
does work:
A few more things to try:
coiled run echo hello --keepalive 10m
and then
coiled cluster ssh
(which is a wrapper around OpenSSH).
If that works, then the issue is something more specific to mutagen.
Since your own Windows, these issues about the default shell might be relevant:
My apologies I don't have a definite solution but those are things to try. Let me know what you see (or if you have any questions).
Hello,
It worked so I guess I ll have to check how mutagen works with WSL.
Thanks anyway, I'll let you know if I find something!
So I m trying to run a notebook with --sync but I m getting
Error attempting to connect sync...
Error: unable to connect to beta: unable to connect to endpoint: unable to dial
agent endpoint: unable to handshake with agent process: unable to receive server
magic number: EOF (error output: Warning: Permanently added the ED25519 host key
for IP address '35.244.113.225' to the list of known hosts.\r
ubuntu@cluster-qhngw.dask.host: Permission denied (publickey).)
So I have setup a notebook without sync using coiled notebook start
When I try to run mutagen or ssh on ubuntu@cluster-bweds.dask.host, i m getting "permission denied public key".
But I can ssh in this machine using coiled cluster ssh
.
coiled cluster ssh
===Starting SSH session to scheduler at cluster-bweds.dask.host===
And if I add my public key in the authorized keys, then I can run
ssh ubuntu@cluster-bweds.dask.host
and mutagen sync create and it works fine.
Logged in : ubuntu@coiled-dask-clement6e-347796-scheduler-5231266:~$
Do you have any idea why I would be able to run coiled cluster ssh but not ssh in the notebook cluster or run --sync?
Hm. So it works if you manually put your own key on the VM and then manually run mutagen sync create
.
I'm curious if it works if you run coiled cluster ssh --add-key
(which adds the key we made for this cluster to your ssh agent) and then manually making the mutagen sync.
I'm also curious if you have many identities already loaded in your ssh agent (i.e., how many things does ssh-add -l
show?).
Yep I just created a new notebook. Cannot ssh, I use the coiled cluster ssh --add-key and then I can SSH.
But I still cannot use mutagen unless I add my public key in the cluster ~/.ssh/authorized_keys And then I can use mutagen in the VM, but because notebooks are running on docker /tmp, I still cannot sync between local and notebooks.
Now I just have the 2 clusters that worked in my ssh agent :
Thanks a lot for your help and for following up! Let me know if you have any solution.
If you'd like to try some more troubleshooting, there's a new version of the coiled
package that includes some extra debug options.
You'd pip install coiled==1.3.2
to get this version.
You'd then start a notebook by running coiled notebook start --name test-name --no-block
. This would give us a notebook that doesn't yet have sync running.
(test-name
can be anything, but the other commands will reference cluster by name, so it's easier if we specify name and can use that, rather than letting start
pick a random name like it does by default.)
Here's the new command that will attempt to start sync on the notebook:
coiled notebook start-sync test-name --debug
If this works, great! But it probably won't.
It will print out the commands it's running, though, so you could
known_hosts
has the new fingerprint like it's supposed tocoiled cluster ssh --add-key
and then running the mutagen command manuallyThat should help narrow down where the problem is.
Oh, and when you're done, it's coiled notebook stop test-name
to stop the notebook. (Normally this happens when you control-c the widget, but --no-block
disabled that.)
Hello!
Thanks for all the help.
So I did follow your instructions and it worked exactly as you expected.
I ran coiled notebook start-sync test-name --debug
and it didnt work.
The new fingerprint has been succesfully added in the known_hosts and I can ssh in the VM, but I cannot use mutagen
It does work when I manually add my id_rsa.pub inside the known_hosts of the coiled VM.
So I guess I have something to fix in the mutagen config ?
But even if I fix that, I dont understand how it is supposed to sync between JupyterLab and my local folder, since it will sync to remote:/scratch/synced, but the notebooks I create from the UI are in a /docker/tmp folder.
Or is there something I am missing?
Thanks for all your patience trying to figure this out together!
At the moment I'm puzzled why your key would work but our key (which apparently does work for coiled cluster ssh
) would not work for mutagen
. This isn't something I've seen before.
If you manually add your id_rsa.pub
to the Coiled VM, does coiled notebook start-sync test-name --debug
then work?
But even if I fix that, I dont understand how it is supposed to sync between JupyterLab and my local folder, since it will sync to remote:/scratch/synced, but the notebooks I create from the UI are in a /docker/tmp folder.
Yeah, there's some other stuff we do (mount inside docker + symlink) to make this all work.
Yes it does work when I add my id_rsa.pub to the coiled VM and run mutagen.
Yeah, there's some other stuff we do (mount inside docker + symlink) to make this all work.
Oh ok, so is it documented anywhere? I dont understand why we would have a notebook --sync flag if it doesnt sync the notebooks. Or how can we have a persistent volume for the notebooks?
@clement-chaneching and I did some troubleshooting on a call.
He was able to repeatedly get things working with
coiled notebook start --name test-name --no-block
coiled notebook start-sync test-name --debug
but got a mutegen
SSH error when running coiled notebook start --sync
.
This is very puzzling since start-sync
literally runs the same code that's run when you use --sync
.
Clement said that he's satisfied using the two separate commands as a workaround.
Hello and happy new year 2024,
I just started testing coiled and I have some issues when trying to run the file sync with
coiled notebook start --sync
I am using WSL with mutagen and openssh installed: OpenSSH_8.4p1 Debian-5+deb11u3, OpenSSL 1.1.1w 11 Sep 2023 Mutagen version 0.17.4 I have created keys using ssh-keygen and used ssh-add, and I can see the cluster key in my known_hosts.
Is there something I am missing? Do I need to add my public key somewhere in coiled? I can successfully use SSH to connect to other machines or Github.
Thanks for your help!