Open hausdorff opened 8 years ago
Hello, Thanks for your interest in this project. I have not touched it in a while so it may have suffered from bitrot. IIRC you need to modify jepsen's source to set authentication right: In the REAME there are a number of troubleshooting hints about authentication issues. Have you tried them?
Yep, I sure have. I didn't see anything about modifying source, but I did a bunch of other stuff, like clearing out the known hosts file and re-populate it with the correct, un-hashed keys.
The second of the errors above is happening, btw, because the IP address is not in the known hosts file. But the fact that I am having trouble even resolving the name n1
is hard to debug, because I just don't know anything about networking. If you could just point me in the right general direction, I could do the rest of the work myself.
Hmm, I think this SO question could be a good start: http://stackoverflow.com/questions/28621167/unable-to-run-jepsen-test-for-either-elasticsearch-or-rabbitmq
I managed to get it working by tweaking directly the jepsen/control.clj
file, but this is definitely not a smooth process. You can authorize all hosts keys this way: for i in 1 2 3 4 5; do ssh-keyscan -t rsa n${i}; done >> ~/.ssh/known_hosts
. You can have a look at each hosts' /var/log/auth.log
to check what's failing in authentication: There is a debug flag you can tweak on each sshd.
I will have a look this evening (CET).
Yep, that is a good SO answer, and I did all those things.
Instead of trying to change the code itself, though, I just cracked open the lein repl
and tried to call directly the clj-ssh
code that control.clj
is using to open the SSH connection. For a variety of values, this doesn't work, which seems to suggest that it is actually the way that the networking is configured in the vagrant container -- which is why I reported the bug to you and not Kyle. :)
Perhaps the DNS/DHCP is configured incorrectly? Other thoughts?
Might be, but apparently you can log in when authorizing host keys from lein repl right ?
Arnaud Bailly
twitter: abailly skype: arnaud-bailly linkedin: http://fr.linkedin.com/in/arnaudbailly/
On Sun, Feb 28, 2016 at 8:05 PM, Alex Clemmer notifications@github.com wrote:
Yep, that is a good SO answer, and I did all those things.
Instead of trying to change the code itself, though, I just cracked open the lein repl and tried to call directly the clj-ssh code that control.clj is using to open the SSH connection. For a variety of values, this doesn't work, which seems to suggest that it is actually the way that the networking is configured in the vagrant container -- which is why I reported the bug to you and not Kyle. :)
Perhaps the DNS/DHCP is configured incorrectly? Other thoughts?
— Reply to this email directly or view it on GitHub https://github.com/abailly/jepsen-vagrant/issues/6#issuecomment-189923546 .
I have actually never successfully logged in from lein repl
. Worse, the auth.log
is not reporting SSH errors when I try, which would suggest that the problem is the values in the known_hosts
file are wrong. But, I did complete the ssh-keyscan
steps above, so I'm not sure how that could be true.
It embarrasses me to ask (since it indicates a pretty thorough lack of networking knowledge :) ) but perhaps I have to restart some daemon after I delete the entries out of the known_hosts
and replace them with the un-hashed ones?
Can you log in from console ? Le 28 févr. 2016 20:09, "Alex Clemmer" notifications@github.com a écrit :
I have actually never successfully logged in from lein repl. Worse, the auth.log is not reporting SSH errors when I try, which would suggest that the problem is the values in the known_hosts file are wrong. But, I did complete the ssh-keyscan steps above, so I'm not sure how that could be true.
— Reply to this email directly or view it on GitHub https://github.com/abailly/jepsen-vagrant/issues/6#issuecomment-189923892 .
Yep, I can. ssh root@n1
authorizes with the key rather than the password though. Not sure if that matters.
Does lein test
work for you, btw?
Could matter yes. IIRC clojure code does not understand using key so it might the case that password is incorrect.
Can you try loging in with password ? Le 28 févr. 2016 20:14, "Alex Clemmer" notifications@github.com a écrit :
Yep, I can. ssh root@n1 authorizes with the key rather than the password though. Not sure if that matters.
— Reply to this email directly or view it on GitHub https://github.com/abailly/jepsen-vagrant/issues/6#issuecomment-189924248 .
Ah. I thought that root@n1
's password should be root.
Based on the following, I don't think it is:
vagrant@jepsen:~$ ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no root@n1
root@n1's password:
Permission denied, please try again.
root@n1's password:
I thought that the password should be root
.
When I chroot /var/lib/lxc/n1/rootfs
and attempt to change the password to root
, the above still doesn't work. Hmmmmmm.
Yes, I managed to make lein test be successful :-)
I am sorry but I cannot debug right now. There is a docker Jepsen floating around, maybe you would have better luck with it ? Le 28 févr. 2016 20:24, "Alex Clemmer" notifications@github.com a écrit :
Ah. I thought that root@n1's password should be root.
Based on the following, I don't think it is:
vagrant@jepsen:~$ ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no root@n1 root@n1's password: Permission denied, please try again. root@n1's password:
I thought that the password should be root.
When I chroot /var/lib/lxc/n1/rootfs and attempt to change the password to root, the above still doesn't work. Hmmmmmm.
— Reply to this email directly or view it on GitHub https://github.com/abailly/jepsen-vagrant/issues/6#issuecomment-189927332 .
Ok, I'll report back if I can get this to work. Thanks for your time!
Did not do much ! Jsch which is used for ssh by jeosen is aging... Le 28 févr. 2016 20:32, "Alex Clemmer" notifications@github.com a écrit :
Ok, I'll report back if I can get this to work. Thanks for your time!
— Reply to this email directly or view it on GitHub https://github.com/abailly/jepsen-vagrant/issues/6#issuecomment-189928947 .
I don't know clojure, and I've never used Vagrant before, so apologies if there is something simple I'm missing.
All of the tests seem to fail out of the box. When you run them you end up with something approximating the following:
Interestingly, I am able to
ssh root@n1
(for example), so it seems like perhaps some ssh daemon somewhere is not talking to our process correctly.When I crack open
lein repl
and do something like(doto (ssh/session (ssh/ssh-agent {}) "n1" {:username "root" :password "root" :port 22 :strict-host-key-checking :yes }) (ssh/connect))
I (perhaps obviously) get the same error:JSchException Auth fail com.jcraft.jsch.Session.connect (Session.java:512)
But, when I replace that name with the IP of the underlying container, I get a different error:
(doto (ssh/session (ssh/ssh-agent {}) "192.168.122.11" {:username "root" :password "root" :port 22 :strict-host-key-checking :yes }) (ssh/connect))
results inJSchException reject HostKey: 192.168.122.11 com.jcraft.jsch.Session.checkHost (Session.java:771)
I had a look in
/var/log/auth.log
but these incidents don't seem to be logged.Do you have any ideas? I am unfortunately a complete networking noob so I'm not sure where else to look.