dmacvicar / terraform-provider-libvirt

Terraform provider to provision infrastructure with Linux's KVM using libvirt
Apache License 2.0
1.54k stars 457 forks source link

Strange behaviour: failed to dial libvirt #1004

Closed itwars closed 1 year ago

itwars commented 1 year ago

On the very host both commands works fine (localhost ip is 192.168.10.201):

virsh -c qemu:///system list --all
virsh -c qemu+ssh://user@192.168.10.201/system list --all

When terraforming using it's ok too:

provider "libvirt" {
  alias = "node1"
  uri   = "qemu:///system"
}

But when terraforming using, it's not good:

provider "libvirt" {
  alias = "node1"
  uri   = "qemu+ssh://user@192.168.10.201/system"
}
module.test-nouvelle-alpine.data.template_file.network_config: Reading...
module.test-nouvelle-alpine.data.template_file.user_data[0]: Reading...
module.test-nouvelle-alpine.data.template_file.network_config: Read complete after 0s [id=142f2ebd49dc333da2f0bf71a7bf27e6acf558cd743fd118b3695d41e5368ec8]
module.test-nouvelle-alpine.data.template_file.user_data[0]: Read complete after 0s [id=5b862b06752472a08f3daeb0d8f77bf3bd2755186a477dd2d5bef90a0d414b69]
╷
│ Error: failed to dial libvirt: failed to connect to libvirt on the remote host: ssh: rejected: connect failed (open failed)
│
│   with provider["registry.terraform.io/dmacvicar/libvirt"].node1,
│   on providers.tf line 1, in provider "libvirt":
│    1: provider "libvirt" {
│
itwars commented 1 year ago

Digging a little deeper, I have this in my log file:

Feb 23 19:15:14 node1 auth.info sshd[4401]: Accepted publickey for user from 192.168.10.201 port 41598 ssh2: RSA SHA256:0f______________WdI
Feb 23 19:15:14 node1 auth.info sshd[4403]: Received request to connect to path /var/run/libvirt/libvirt-sock, but the request was denied.

acl on file file seems to be ok:

srwxrwxrwx 1 root root 0 Feb 23 18:29 /var/run/libvirt/libvirt-sock
alexandre-janniaux commented 1 year ago

What is your sshd configuration ?

itwars commented 1 year ago

Config file of sshd_config is quite basic :

PermitRootLogin yes
AuthorizedKeysFile      .ssh/authorized_keys
AllowTcpForwarding no
GatewayPorts no
X11Forwarding no
Subsystem       sftp    internal-sftp

No additional config in .ssh directory user 'user'

LavBU commented 1 year ago

Hi @itwars

You need to add the user to libvirt group as follows: usermod -a -G libvirt <user>

for example, in main.tf: uri = "qemu+ssh://<user>@<my host>/system?keyfile=/root/.ssh/<ssh private key>&sshauth=privkey&no_verify=1"

Then, on the physical host which you wish to deploy your VM in, you should add the user to libvirt group: # usermod -a -G libvirt <user>

You can verify that, by checking it in /etc/group: # grep libvirt /etc/group

I hope it will fix your issue. Lavi

itwars commented 1 year ago

Hi @LavBU

is actually member of libvirt + qemu in addition is sudoer
LavBU commented 1 year ago

Hi @itwars

If you are able to run this from where terraform is running on towards the remote host: virsh -c qemu+ssh://user@192.168.10.201/system list --all

It means you are using SSH key to access the host.

Therefore, you should add that SSH key to authorized_keys file for your user that is defined in that host, for example: # cat ~<user>/.ssh/authorized_keys

Should show you that SSH key.

Lavi

itwars commented 1 year ago

Hi @LavBU

virsh

On the localhost (192.168.10.201) both are ok:

virsh -c qemu:///system list --all
virsh -c qemu+ssh://user@192.168.10.201/system list --all

From a remote host (192.168.10.202 to 192.168.10.201) it's ok too:

virsh -c qemu+ssh://user@192.168.10.201/system list --all

terraform apply

On the localhost

Terraforming from 192.168.10.201 to 192.168.10.201 failed with:

module.test-nouvelle-alpine.data.template_file.network_config: Reading...
module.test-nouvelle-alpine.data.template_file.user_data[0]: Reading...
module.test-nouvelle-alpine.data.template_file.network_config: Read complete after 0s [id=142f2ebd49dc333da2f0bf71a7bf27e6acf558cd743fd118b3695d41e5368ec8]
module.test-nouvelle-alpine.data.template_file.user_data[0]: Read complete after 0s [id=5b862b06752472a08f3daeb0d8f77bf3bd2755186a477dd2d5bef90a0d414b69]
╷
│ Error: failed to dial libvirt: failed to connect to libvirt on the remote host: ssh: rejected: connect failed (open failed)
│
│   with provider["registry.terraform.io/dmacvicar/libvirt"].node1,
│   on providers.tf line 1, in provider "libvirt":
│    1: provider "libvirt" {
│
Terraform v1.3.4
on linux_amd64
+ provider registry.terraform.io/dmacvicar/libvirt v0.7.1
+ provider registry.terraform.io/hashicorp/template v2.2.0

Your version of Terraform is out of date! The latest version
is 1.3.9. You can update by downloading from https://www.terraform.io/downloads.html

From remote host

It's OK from 192.168.10.202 to 192.168.10.201 terraform deploy my VM!

I know, I know it's an old version

Terraform v1.1.2
on linux_amd64
+ provider registry.terraform.io/dmacvicar/libvirt v0.6.3
+ provider registry.terraform.io/hashicorp/template v2.2.0

Your version of Terraform is out of date! The latest version
is 1.3.9. You can update by downloading from https://www.terraform.io/downloads.html

So as a summary:

virsh:

terraform:

LavBU commented 1 year ago

H @itwars

try to check the system log when that fail: # journalctl -f

Lavi

itwars commented 1 year ago

@LavBU

During the 'apply' I got this error:

Mar  8 15:34:24 nodeX1 auth.info sshd[5305]: Accepted publickey for user from 192.168.10.201 port 34474 ssh2: RSA SHA256:xxxxxxxxxxxxx
Mar  8 15:34:24 nodeX1 auth.info sshd[5307]: Received request to connect to path /var/run/libvirt/libvirt-sock, but the request was denied.

Checking my socket, permissions are good?

srwxrwxrwx 1 root root 0 Mar  5 12:33 /var/run/libvirt/libvirt-sock
LavBU commented 1 year ago

See this for some ideas:

https://askubuntu.com/questions/345218/virt-manager-cant-connect-to-libvirt

From: Vincent RABAH @.> Sent: Wednesday, March 8, 2023 4:39 PM To: dmacvicar/terraform-provider-libvirt @.> Cc: Lavi Buchnik @.>; Mention @.> Subject: Re: [dmacvicar/terraform-provider-libvirt] Strange behaviour: failed to dial libvirt (Issue #1004)

@LavBUhttps://urldefense.com/v3/__https:/github.com/LavBU__;!!NknhfzgzgQ!wDEE_QKb1GNHzuLCCP-IvXbOP_KiA80ksq0rYHr6uEZNLRx4_5nBLwyGqlopn2GmwbO2o94qDoXcCmnOZkyuiVV7D8p3uw$

During the 'apply' I got this error:

Mar 8 15:34:24 nodeX1 auth.info sshd[5305]: Accepted publickey for user from 192.168.10.201 port 34474 ssh2: RSA SHA256:xxxxxxxxxxxxx

Mar 8 15:34:24 nodeX1 auth.info sshd[5307]: Received request to connect to path /var/run/libvirt/libvirt-sock, but the request was denied.

Checking my socket, permissions are good?

srwxrwxrwx 1 root root 0 Mar 5 12:33 /var/run/libvirt/libvirt-sock

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/dmacvicar/terraform-provider-libvirt/issues/1004*issuecomment-1460260876__;Iw!!NknhfzgzgQ!wDEE_QKb1GNHzuLCCP-IvXbOP_KiA80ksq0rYHr6uEZNLRx4_5nBLwyGqlopn2GmwbO2o94qDoXcCmnOZkyuiVXfpT2MUw$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/A2VNY6EUNECUFQJRNFUVW53W3CKZZANCNFSM6AAAAAAVF6H5HM__;!!NknhfzgzgQ!wDEE_QKb1GNHzuLCCP-IvXbOP_KiA80ksq0rYHr6uEZNLRx4_5nBLwyGqlopn2GmwbO2o94qDoXcCmnOZkyuiVVW31FVvg$. You are receiving this because you were mentioned.Message ID: @.**@.>>

Confidentiality note: This e-mail may contain confidential information from Clarivate. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is strictly prohibited. If you have received this e-mail in error, please delete this e-mail and notify the sender immediately.

itwars commented 1 year ago

Hello @LavBU, I've tested every anwsers without luke... Still stuck!

MattSnow-amd commented 1 year ago

I recently came across a similar experience. I needed an SSH key pair that did not have a passphrase. The public key needed to be in the remote user's ~/.ssh/authorized_keys file. Permissions for .ssh and contained files also need to be correct.

provider "libvirt" {
uri = "qemu+ssh://root@${var.hypervisor_host}/system?keyfile=/home/me/.ssh/id_ed25519-nopw"
}
Magnitus- commented 1 year ago

I can separately confirm that this feature works fine with this provider. We recently upgraded some of our orchestration operating libvirt on remote machines to version 0.7.1 of the provider without any issue (with terraform version 1.2.9).

Like MattSnow-amd mentioned, your issue is most likely an ssh setup issue for the user running terraform.

You need to ensure that whichever user runs terraform has proper passwordless ssh access to the libvirt user on the remote machine and you need to do this specifically for the environment in which terraform is running (ex: if you run terraform from a container, it may be that the container doesn't have the right ssh keys setup for example).

itwars commented 1 year ago

Hello, Same issue even with :

provider "libvirt" {
uri = "qemu+ssh://root@${var.hypervisor_host}/system?keyfile=/home/me/.ssh/id_ed25519-nopw"
}

And I've also rollout every ssh key on all my cluster! OK with my 2 Ubuntu hosts, fail with my 4 Alpine hosts.

@Magnitus- : just to refresh the context the issue doesn't exist from remote, it's only happen from local server!

itwars commented 1 year ago

Hello, It seem I'm not the only one facing this issue: #939 ? I close this issue and move to the 939

Magnitus- commented 1 year ago

@itwars Yeah, if you are confident about your ssh setup, it might be an obscure alpine incompatibilities. My understanding is that they use a lot of different lighter dependencies to make everything smaller which I know can cause some compatibility issues from my superficial usage of it in docker containers.

Unless specific constraints force my hand, I'm happy to stick with Ubuntu/Debian as it just makes my life a lot simpler operationally (there are just so many things to work on and so little time), so I won't be of much help here but it seems they are well underway to troubleshooting this in the thread you linked.

Best of luck.

itwars commented 1 year ago

Hooray! After digging deeper I've compare /etc/ssh/sshd_config line by line between ubuntu and alpine, and finally found the "guilty" line of configuration. By changing: AllowTcpForwarding no to AllowTcpForwarding yes!!!