apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.05k stars 1.1k forks source link

4.12 add host fail ,using tls connection #3509

Closed digitalgust closed 5 years ago

digitalgust commented 5 years ago
ISSUE TYPE
COMPONENT NAME
INSTALL
CLOUDSTACK VERSION
4.12
CONFIGURATION

advanced networking

OS / ENVIRONMENT

CentOS7.6

SUMMARY

I am an experience user of cloudstack , we built enterprise private cloud in cloudstack two years . now we test the new 4.12 version for update . we install management ok. install agent in hypervisor ok. but we encounter an issue when add first zone. one of step is add host, 4.12 using tls comucation with agent and manager, but libvirtd connect error , tls can't work ,the host note as "alert" in MGRUI.

we reinstall cloudstack-agent and libvirtd , same error we configure libvirtd with tcp ,not tls , can add but show unscure ,then click "prevission host key" , restart agent , error show again we checked the /var/log/message for libvirtd log , no especial info there. some times , keep "alert" state , about one or two hours later , the first host added ,and show in "UP" ,why? but it turn back "alert" state when restart management. we checked all tls certificate exists.

the key log when add a host, if no error log then host state turn "UP" :

2019-07-23 01:11:24,309 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion

full log when agent first add to zone:

2019-07-23 01:07:30,306 INFO  [utils.nio.NioClient] (main:null) (logid:) Connecting to localhost:8250
2019-07-23 01:07:30,306 WARN  [utils.nio.NioConnection] (main:null) (logid:) Unable to connect to remote: is there a server running on port 8250
2019-07-23 01:07:35,307 INFO  [cloud.agent.Agent] (main:null) (logid:) Connecting to host:localhost
2019-07-23 01:07:35,308 INFO  [utils.nio.NioClient] (main:null) (logid:) Connecting to localhost:8250
2019-07-23 01:07:35,313 WARN  [utils.nio.NioConnection] (main:null) (logid:) Unable to connect to remote: is there a server running on port 8250
2019-07-23 01:07:38,229 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill
2019-07-23 01:08:50,219 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Agent started
2019-07-23 01:08:50,221 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Implementation Version is 4.12.0.0
2019-07-23 01:08:50,222 INFO  [cloud.agent.AgentShell] (main:null) (logid:) agent.properties found at /etc/cloudstack/agent/agent.properties
2019-07-23 01:08:50,235 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to using properties file for storage
2019-07-23 01:08:50,236 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to the constant time backoff algorithm
2019-07-23 01:08:50,246 INFO  [cloud.utils.LogUtils] (main:null) (logid:) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
2019-07-23 01:08:50,254 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Using default Java settings for IPv6 preference for agent connection
2019-07-23 01:08:50,355 INFO  [cloud.agent.Agent] (main:null) (logid:) id is 
2019-07-23 01:08:50,357 WARN  [cloud.resource.ServerResourceBase] (main:null) (logid:) Nics are not specified in properties file/db, will try to autodiscover
2019-07-23 01:08:50,361 INFO  [cloud.resource.ServerResourceBase] (main:null) (logid:) Designating private to be nic cloudbr0
2019-07-23 01:08:50,374 INFO  [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one
2019-07-23 01:08:50,529 INFO  [org.reflections.Reflections] (main:null) (logid:) Reflections took 26 ms to scan 1 urls, producing 7 keys and 10 values 
2019-07-23 01:08:50,541 INFO  [kvm.resource.LibvirtComputingResource] (main:null) (logid:) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
2019-07-23 01:08:50,646 INFO  [cloud.agent.Agent] (main:null) (logid:) Agent [id = new : type = LibvirtComputingResource : zone = default : pod = default : workers = 5 : host = localhost : port = 8250
2019-07-23 01:08:50,650 INFO  [utils.nio.NioClient] (main:null) (logid:) Connecting to localhost:8250
2019-07-23 01:08:50,651 WARN  [utils.nio.NioConnection] (main:null) (logid:) Unable to connect to remote: is there a server running on port 8250
2019-07-23 01:08:55,652 INFO  [cloud.agent.Agent] (main:null) (logid:) Connecting to host:localhost
2019-07-23 01:08:55,655 INFO  [utils.nio.NioClient] (main:null) (logid:) Connecting to localhost:8250
2019-07-23 01:08:55,656 WARN  [utils.nio.NioConnection] (main:null) (logid:) Unable to connect to remote: is there a server running on port 8250
2019-07-23 01:09:00,539 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill
2019-07-23 01:09:00,657 INFO  [cloud.agent.Agent] (main:null) (logid:) Connecting to host:localhost
2019-07-23 01:09:00,659 INFO  [utils.nio.NioClient] (main:null) (logid:) Connecting to localhost:8250
2019-07-23 01:09:00,660 WARN  [utils.nio.NioConnection] (main:null) (logid:) Unable to connect to remote: is there a server running on port 8250
2019-07-23 01:09:07,634 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Agent started
2019-07-23 01:09:07,637 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Implementation Version is 4.12.0.0
2019-07-23 01:09:07,638 INFO  [cloud.agent.AgentShell] (main:null) (logid:) agent.properties found at /etc/cloudstack/agent/agent.properties
2019-07-23 01:09:07,649 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to using properties file for storage
2019-07-23 01:09:07,651 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to the constant time backoff algorithm
2019-07-23 01:09:07,661 INFO  [cloud.utils.LogUtils] (main:null) (logid:) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
2019-07-23 01:09:07,672 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Using default Java settings for IPv6 preference for agent connection
2019-07-23 01:09:07,773 INFO  [cloud.agent.Agent] (main:null) (logid:) id is 
2019-07-23 01:09:07,774 WARN  [cloud.resource.ServerResourceBase] (main:null) (logid:) Nics are not specified in properties file/db, will try to autodiscover
2019-07-23 01:09:07,778 INFO  [cloud.resource.ServerResourceBase] (main:null) (logid:) Designating private to be nic cloudbr0
2019-07-23 01:09:07,795 INFO  [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one
2019-07-23 01:11:23,438 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Agent started
2019-07-23 01:11:23,440 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Implementation Version is 4.12.0.0
2019-07-23 01:11:23,440 INFO  [cloud.agent.AgentShell] (main:null) (logid:) agent.properties found at /etc/cloudstack/agent/agent.properties
2019-07-23 01:11:23,450 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to using properties file for storage
2019-07-23 01:11:23,451 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to the constant time backoff algorithm
2019-07-23 01:11:23,460 INFO  [cloud.utils.LogUtils] (main:null) (logid:) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
2019-07-23 01:11:23,470 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Using default Java settings for IPv6 preference for agent connection
2019-07-23 01:11:23,562 INFO  [cloud.agent.Agent] (main:null) (logid:) id is 
2019-07-23 01:11:23,579 INFO  [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one
2019-07-23 01:11:23,721 INFO  [org.reflections.Reflections] (main:null) (logid:) Reflections took 26 ms to scan 1 urls, producing 7 keys and 10 values 
2019-07-23 01:11:23,732 INFO  [kvm.resource.LibvirtComputingResource] (main:null) (logid:) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
2019-07-23 01:11:23,833 INFO  [cloud.agent.Agent] (main:null) (logid:) Agent [id = new : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 192.168.100.1 : port = 8250
2019-07-23 01:11:23,836 INFO  [utils.nio.NioClient] (main:null) (logid:) Connecting to 192.168.100.1:8250
2019-07-23 01:11:23,840 INFO  [utils.nio.Link] (main:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2019-07-23 01:11:24,242 INFO  [utils.nio.NioClient] (main:null) (logid:) SSL: Handshake done
2019-07-23 01:11:24,242 INFO  [utils.nio.NioClient] (main:null) (logid:) Connected to 192.168.100.1:8250
2019-07-23 01:11:24,305 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 36380b99-39b5-48e3-9202-6d69a3efabe1 (Filesystem) in libvirt
2019-07-23 01:11:24,309 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion
2019-07-23 01:11:24,310 WARN  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Storage pool 36380b99-39b5-48e3-9202-6d69a3efabe1 was not found running in libvirt. Need to create it.
2019-07-23 01:11:24,311 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Didn't find an existing storage pool 36380b99-39b5-48e3-9202-6d69a3efabe1 by UUID, checking for pools with duplicate paths
2019-07-23 01:11:24,314 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 36380b99-39b5-48e3-9202-6d69a3efabe1 from libvirt
2019-07-23 01:11:35,629 INFO  [cloud.serializer.GsonHelper] (Agent-Handler-1:null) (logid:) Default Builder inited.
STEPS TO REPRODUCE
add host ->input ip -> host add
EXPECTED RESULTS
add host 
ACTUAL RESULTS
host added , but show alert, can't work,  the problem at libvirtd tls connection
rohityadavcloud commented 5 years ago

Check firewall rules on the mgmt server and ensure agent can connect to mgmt server on port 8250. Next, ensure while adding kvm host the user/password can ssh from mgmt server to kvm host and can execute sudo. On kvm host while adding the host by default it will configure libvirtd to be tls enabled. This error seems to be libvirtd related: Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion. Can you try to restart libvirtd and agent?

digitalgust commented 5 years ago

all hosts firewalld was disabled. agent can connect to mgmt 8250 . ssh from mgmt to agent ok. restarted and reinstall multiple times.

2019-07-23 07:47:56,608 INFO  [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill
2019-07-23 07:48:03,718 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Agent started
2019-07-23 07:48:03,720 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Implementation Version is 4.12.0.0
2019-07-23 07:48:03,721 INFO  [cloud.agent.AgentShell] (main:null) (logid:) agent.properties found at /etc/cloudstack/agent/agent.properties
2019-07-23 07:48:03,730 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to using properties file for storage
2019-07-23 07:48:03,731 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to the constant time backoff algorithm
2019-07-23 07:48:03,739 INFO  [cloud.utils.LogUtils] (main:null) (logid:) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
2019-07-23 07:48:03,749 INFO  [cloud.agent.AgentShell] (main:null) (logid:) Using default Java settings for IPv6 preference for agent connection
2019-07-23 07:48:03,841 INFO  [cloud.agent.Agent] (main:null) (logid:) id is 2
2019-07-23 07:48:03,859 INFO  [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one
2019-07-23 07:48:04,018 INFO  [org.reflections.Reflections] (main:null) (logid:) Reflections took 29 ms to scan 1 urls, producing 7 keys and 10 values 
2019-07-23 07:48:04,031 INFO  [kvm.resource.LibvirtComputingResource] (main:null) (logid:) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
2019-07-23 07:48:04,139 INFO  [cloud.agent.Agent] (main:null) (logid:) Agent [id = 2 : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 192.168.100.
2019-07-23 07:48:04,142 INFO  [utils.nio.NioClient] (main:null) (logid:) Connecting to 192.168.100.1:8250
2019-07-23 07:48:04,145 INFO  [utils.nio.Link] (main:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2019-07-23 07:48:04,398 INFO  [utils.nio.NioClient] (main:null) (logid:) SSL: Handshake done
2019-07-23 07:48:04,398 INFO  [utils.nio.NioClient] (main:null) (logid:) Connected to 192.168.100.1:8250
2019-07-23 07:48:04,462 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793 (Files
2019-07-23 07:48:04,466 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetV
2019-07-23 07:48:04,469 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793, usi
2019-07-23 07:48:04,469 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793 from libvir
2019-07-23 07:48:09,938 INFO  [cloud.serializer.GsonHelper] (Agent-Handler-1:null) (logid:) Default Builder inited.
rohityadavcloud commented 5 years ago

@digitalgust looks like an issue or configuration issue with libvirtd. Can you try to get the latest or try qemu-ev on centos7? Please check that libvirtd is running, you can try something like virsh list etc.

digitalgust commented 5 years ago

libvirtd is running.

[root@host02 ~]# virsh list
 Id   Name     State
------------------------
 1    v-5-VM   running

this vm is running on the host , i can startup a vm when host02 state is "UP", but if restart agent or manager,the host02 state turn "alert" .

rohityadavcloud commented 5 years ago

Could be related to this https://github.com/apache/cloudstack/pull/2904 @digitalgust does agent go into Up state when you restart the kvm agent? (systemctl restart cloudstack-agent ?) Can you test 4.11.3.0?

digitalgust commented 5 years ago

now , my host02 state is "alert" log :

2019-07-23 08:13:50,629 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Reconnecting to host:192.168.100.1
2019-07-23 08:13:50,631 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) Connecting to 192.168.100.1:8250
2019-07-23 08:13:50,633 INFO  [utils.nio.Link] (Agent Timer:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2019-07-23 08:13:50,811 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) SSL: Handshake done
2019-07-23 08:13:50,812 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) Connected to 192.168.100.1:8250
2019-07-23 08:13:50,859 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793 (Filesystem) in libvirt
2019-07-23 08:13:50,860 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion
2019-07-23 08:13:50,861 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793, using it.
2019-07-23 08:13:50,862 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793 from libvirt
2019-07-23 08:13:55,812 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Connected to the host: 192.168.100.1
2019-07-23 08:19:56,739 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) The startup command is now cancelled
2019-07-23 08:19:56,741 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Lost connection to host: 192.168.100.1. Attempting reconnection while we still have 0 commands in progress.
2019-07-23 08:19:56,743 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) NioClient connection closed
2019-07-23 08:19:56,744 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Reconnecting to host:192.168.100.1
2019-07-23 08:19:56,745 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) Connecting to 192.168.100.1:8250
2019-07-23 08:19:56,747 INFO  [utils.nio.Link] (Agent Timer:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2019-07-23 08:19:56,923 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) SSL: Handshake done
2019-07-23 08:19:56,924 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) Connected to 192.168.100.1:8250
2019-07-23 08:19:56,971 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793 (Filesystem) in libvirt
2019-07-23 08:19:56,972 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion
2019-07-23 08:19:56,973 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793, using it.
2019-07-23 08:19:56,973 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool eb3a089f-eaec-45ce-bb97-a66d8d9df793 from libvirt
2019-07-23 08:20:01,924 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Connected to the host: 192.168.100.1
2019-07-23 08:25:57,607 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) The startup command is now cancelled
2019-07-23 08:25:57,608 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Lost connection to host: 192.168.100.1. Attempting reconnection while we still have 0 commands in progress.
2019-07-23 08:25:57,611 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) NioClient connection closed
digitalgust commented 5 years ago

let me try 4.11.3.0

digitalgust commented 5 years ago

test 4.11.3.0, the problem same as 4.12 hypervisor add unreliable , i have 2 hosts for hypervisor in my test environment. add fisrst host , it always shown "alert“, then remove the first host, mgmt note success message, then add the second host, second show "alert" ,!!! the first host turn "UP", but the first host is success deleted a momont ago. or keep it for serial hours in "alert" state, the host turn "UP". when host in "alert" state, agent cant't connect to mgmt, but in the host ping mgmt normal, and mgmt port 8250 is working, the same time , there is another host in "UP" state . i can abstractly confirm the host's network is connectable . Puzzling .

this is agent log ,the host in "alert" state

2019-07-25 16:03:40,152 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Reconnecting to host:192.168.100.1
2019-07-25 16:03:40,153 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) Connecting to 192.168.100.1:8250
2019-07-25 16:03:40,154 INFO  [utils.nio.Link] (Agent Timer:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2019-07-25 16:03:40,289 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) SSL: Handshake done
2019-07-25 16:03:40,290 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) Connected to 192.168.100.1:8250
2019-07-25 16:03:40,290 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system
2019-07-25 16:03:40,296 DEBUG [kvm.resource.LibvirtCapXMLParser] (Agent-Handler-1:null) (logid:) Found /usr/libexec/qemu-kvm as a suiteable emulator
2019-07-25 16:03:40,296 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: /bin/bash -c qemu-img --help|grep convert 
2019-07-25 16:03:40,297 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 1800000
2019-07-25 16:03:40,314 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful.
2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:)   convert [--object objectdef] [--image-opts] [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] [-m num_coroutines] [-W] filename [filename2 [...]] output_filename
    options are: 'none', 'writeback' (default, except for convert), 'writethrough',
    'directsync' and 'unsafe' (default for convert)
Parameters to convert subcommand:
  '-m' specifies how many coroutines work in parallel during the convert

2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) cpus=8, speed=4500, ram=7378771968, _dom0MinMem=1073741824, _dom0OvercommitMem=0, cpu sockets=1
2019-07-25 16:03:40,315 DEBUG [cloud.resource.ServerResourceBase] (Agent-Handler-1:null) (logid:) Parameters for private nic: 192.168.100.50 - 1c:69:7a:00:e2:76-255.255.255.0
2019-07-25 16:03:40,315 DEBUG [cloud.resource.ServerResourceBase] (Agent-Handler-1:null) (logid:) Parameters for storage nic: 192.168.100.50 - 1c:69:7a:00:e2:76-255.255.255.0
2019-07-25 16:03:40,315 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/versions.sh 
2019-07-25 16:03:40,316 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 1800000
2019-07-25 16:03:40,324 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful.
2019-07-25 16:03:40,330 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing: sudo grep InitiatorName= /etc/iscsi/initiatorname.iscsi 
2019-07-25 16:03:40,331 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Executing while with timeout : 3600000
2019-07-25 16:03:40,346 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Execution is successful.
2019-07-25 16:03:40,346 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 (Filesystem) in libvirt
2019-07-25 16:03:40,346 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system
2019-07-25 16:03:40,347 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4, using it.
2019-07-25 16:03:40,347 INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 from libvirt
2019-07-25 16:03:40,347 DEBUG [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Looking for libvirtd connection at: qemu:///system
2019-07-25 16:03:40,350 DEBUG [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Succesfully refreshed pool 43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4 Capacity: 53660876800 Used: 1794367488 Available: 51866509312
2019-07-25 16:03:40,351 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing: hostname 
2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing while with timeout : 500
2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Execution is successful.
2019-07-25 16:03:40,352 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing: hostname 
2019-07-25 16:03:40,353 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Executing while with timeout : 500
2019-07-25 16:03:40,353 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Execution is successful.
2019-07-25 16:03:40,354 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Sending Startup: Seq 4-15:  { Cmd , MgmtId: -1, via: 4, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":1,"cpus":8,"speed":4500,"memory":7378771968,"dom0MinMemory":1073741824,"poolSync":false,"supportsClonedVolumes":false,"caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"3.10.0-957.el7.x86_64","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"7.6.1810","secured":"true","Host.OS":"CentOS"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"3960b86e-3c62-3d60-a52f-dc64a9e09619-LibvirtComputingResource","name":"host01","id":4,"version":"4.11.3.0","iqn":"iqn.1994-05.com.redhat:3ae550194c3","privateIpAddress":"192.168.100.50","privateMacAddress":"1c:69:7a:00:e2:76","privateNetmask":"255.255.255.0","storageIpAddress":"192.168.100.50","storageNetmask":"255.255.255.0","storageMacAddress":"1c:69:7a:00:e2:76","resourceName":"LibvirtComputingResource","gatewayIpAddress":"192.168.100.254","msHostList":"192.168.100.1","wait":0}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":0,"poolInfo":{"uuid":"43a0fdb9-9faf-406b-ba0b-388aa1bbc7d4","host":"192.168.100.50","localPath":"/var/lib/libvirt/images","hostPath":"/var/lib/libvirt/images","poolType":"Filesystem","capacityBytes":53660876800,"availableBytes":51866509312},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","pod":"1","guid":"3960b86e-3c62-3d60-a52f-dc64a9e09619-LibvirtComputingResource","name":"host01","id":4,"version":"4.11.3.0","resourceName":"LibvirtComputingResource","msHostList":"192.168.100.1","wait":0}}] }
2019-07-25 16:03:40,354 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Startup task created
==============here can't connect to mgmt=======
2019-07-25 16:03:45,291 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Connected to the host: 192.168.100.1

2019-07-25 16:09:40,354 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) The startup command is now cancelled
================
2019-07-25 16:09:40,355 DEBUG [cloud.agent.Agent] (Agent Timer:null) (logid:) Clearing watch list: 0
2019-07-25 16:09:40,355 INFO  [cloud.agent.Agent] (Agent Timer:null) (logid:) Lost connection to host: 192.168.100.1. Attempting reconnection while we still have 0 commands in progress.
2019-07-25 16:09:40,355 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/192.168.100.1,port=8250,localport=54460]
2019-07-25 16:09:40,357 INFO  [utils.nio.NioClient] (Agent Timer:null) (logid:) NioClient connection closed
rohityadavcloud commented 5 years ago

Are you running multiple management server(s) are starting them concurrently @digitalgust ? Can you also share details about your env, like number of secondary and primary storages, networking etc?

I'll test against 4.11.2.0 and 4.11.3.0.

digitalgust commented 5 years ago

1 mgmt : centos 7.6 /intel7500U/2G 192.168.100.1 /2nic 1 secondary : centos 7.6 / intel7500U/2G 192.168.100.20 /1nic 2 Hypervisor : centos 7.6/ intel8559U8259U/8G 192.168.100.50,192.168.100.51 /libvirtd 4.5/qemu2.9 2 ceph: centos 7.6 /intel8259U/8G 192.168.100.10,192.168.100.11 ceph 14

cloudstack 4.12 / 4.11.3.0 advance network KVM guest CIDR 10.1.1.0/16 manage_network & storage_network on cloudbr0 connect to hardware switch 0 public_newwork & guest_network on cloudbr1 connect to hardware switch 1 public traffic: gateway 10.0.99.254, 255.255.0.0 start ip:10.0.99.100 end ip :10.0.99.199 pod: gateway 192.168.100.254, mask 255.255.255.0 reserve ip : 192.168.100.100-192.168.100.199 vlan : 1000-1009 storage traffic: gateway192.168.100.254, start ip: 192.168.100.10- 192.168.100.19 host : 192.168.100.50 primary storage : nfs or ceph both work 4.12, ceph dont work 4.11.3 secondary storage: nfs work

rohityadavcloud commented 5 years ago

Thanks @digitalgust I'm unable to reproduce. The issue seems to be with libvirt or qemu, kindly check libvirt/qemu logs.

digitalgust commented 5 years ago

I reinstall cloudstack ,it works with these changes : exchange cloudbr0 /cloudbr1 bridge to physic nic, maybe unrelated that . uninstall qemu2.9 and reinstall centos7 qemu (1.53) , maybe the problem cause.

after reinstall , agent connect to mgmt quickly.

gngusen commented 5 years ago

Its great that I found experts in cloudstack. May you lease help me I am unable to add the host to the management server The management server and the host are separate machines in the same network with a private Ip address. The management server also serves as the storage, is installed correctly and can be obtained in the UI but unable to add host Agent and libvirtd are running properly Management and mariadb are also running actively But the storage and host can not be added to the management server?

Help please?

andrijapanicsb commented 4 years ago

This happens with qemu-ev from the SIG repo. That is the reason.

Needs to be fixed.

wqmeng commented 4 years ago

Hello, I have reinstall many time with latest cloudstack 4.13.0, and continue get this error when add the Host. ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion .

And the libvirtd works fine.

[root@localhost ~]# /bin/systemctl status libvirtd.service
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-11-27 05:18:10 PST; 18min ago
     Docs: man:libvirtd(8)
           https://libvirt.org
 Main PID: 26297 (libvirtd)
    Tasks: 17 (limit: 32768)
   CGroup: /system.slice/libvirtd.service
           └─26297 /usr/sbin/libvirtd -l

Nov 27 05:18:09 2278.xxxx.com systemd[1]: Starting Virtualization daemon...
Nov 27 05:18:10 2278.xxxx.com systemd[1]: Started Virtualization daemon.
Nov 27 05:18:10 2278.xxxx.com libvirtd[26297]: 2019-11-27 13:18:10.978+0000: 26313: info : libvirt version: 4.5.0, package: 23.el7_7.1 (CentOS BuildSystem <http://bugs.centos.org>, 2019-09-13-18:...ys.centos.org)
Nov 27 05:18:10 2278.xxxx.com libvirtd[26297]: 2019-11-27 13:18:10.978+0000: 26313: info : hostname: 2278.xxxx.com
Nov 27 05:18:10 2278.xxxx.com libvirtd[26297]: 2019-11-27 13:18:10.978+0000: 26313: warning : virSecurityManagerNew:189 : Configured security driver "none" disables default policy to create confined guests
[root@localhost ~]# /bin/systemctl status cloudstack-agent.service
● cloudstack-agent.service - CloudStack Agent
   Loaded: loaded (/usr/lib/systemd/system/cloudstack-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-11-27 05:27:33 PST; 17min ago
     Docs: http://www.cloudstack.org/
 Main PID: 27937 (java)
    Tasks: 177
   CGroup: /system.slice/cloudstack-agent.service
           └─27937 /usr/bin/java -Djava.io.tmpdir=/usr/share/cloudstack-agent/tmp -Xms256m -Xmx2048m -cp /usr/share/cloudstack-agent/lib/*:/usr/share/cloudstack-agent/plugins/*:/etc/cloudstack/agent:/usr/share/cl...

Nov 27 05:44:22 2278.xxxx.com java[27937]: INFO  [utils.nio.NioClient] (Agent-Handler-5:) (logid:) SSL: Handshake done
Nov 27 05:44:22 2278.xxxx.com java[27937]: INFO  [utils.nio.NioClient] (Agent-Handler-5:) (logid:) Connected to 111.111.111.111:8250
Nov 27 05:44:22 2278.xxxx.com sudo[26170]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi
Nov 27 05:44:22 2278.xxxx.com java[27937]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to create storage pool c82dfc97-c60c-4681-92ff-f29a5e9c96c9 (Filesystem) in libvirt
Nov 27 05:44:22 2278.xxxx.com java[27937]: libvirt: Domain Config error : invalid connection pointer in virConnectGetVersion
Nov 27 05:44:22 2278.xxxx.com java[27937]: ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion
Nov 27 05:44:22 2278.xxxx.com java[27937]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing defined storage pool c82dfc97-c60c-4681-92ff-f29a5e9c96c9, using it.
Nov 27 05:44:22 2278.xxxx.com java[27937]: INFO  [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch storage pool c82dfc97-c60c-4681-92ff-f29a5e9c96c9 from libvirt
Nov 27 05:44:23 2278.xxxx.com java[27937]: INFO  [cloud.agent.Agent] (Agent-Handler-5:) (logid:) Connected to the host: 111.111.111.111
Nov 27 05:44:27 2278.xxxx.com java[27937]: INFO  [cloud.agent.Agent] (Agent-Handler-5:) (logid:) Connected to the host: 111.111.111.111
[root@localhost ~]# lsmod | grep kvm
kvm_intel             188644  0
kvm                   621480  1 kvm_intel
irqbypass              13503  1 kvm
andrijapanicsb commented 4 years ago

But it still works, Agent I mean.

Can you try with ACS 4.11.2 and see if that error is also present (if you have time)?

weizhouapache commented 4 years ago

@wqmeng do you use qemu-kvm or qemu-kvm-ev ?

andrijapanicsb commented 4 years ago

qemu-kvm-ev 2.12.x (latest) is not working with 4.13 and master - SSVMs can not start - but user VMs can start fine. I also couldn't make it work when downgraded to qemu-kvm-ev 2.10.x @weizhouapache

wqmeng commented 4 years ago

@andrijapanicsb It should be qemu-kvm,

[root@localhost ~]# qemu-img --version qemu-img version 1.5.3, Copyright (c) 2004-2008 Fabrice Bellard usage: qemu-img command [command options]

I install the latest CentOS 7.7, and before install cloudstack 4.13.0, I yum -y upgrade and upgrade all the system. And then install the cloudstack agent from yum after add the repo of http://download.cloudstack.org/centos/$releasever/4.13/

andrijapanicsb commented 4 years ago

Actually... I had that identical error with qemu-kvm-ev - I think I created an issue yesterday here...

wqmeng commented 4 years ago

I have also tried the cloudstack 4.14.0 in the github, and still get this error.

Nov 28 06:56:42 localhost.localdomain java[26143]: libvirt: Domain Config error : invalid connection pointer in virConnectGetVersion
Nov 28 06:56:42 localhost.localdomain java[26143]: ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion

Should this been configure by some port?Or set TLS or instead use TCP?

Do I miss something?

wqmeng commented 4 years ago

And in the server side, shows

Nov 28 23:01:49 localhost.localdomain java[17220]: WARN  [c.c.a.AlertManagerImpl] (AgentConnectTaskPool-157:ctx-f42983f6) (logid:43503289) AlertType:: 7 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: Incorrect Network setup on agent, Reinitialize agent after network names are setup, details : Can not find network: breno1-4000
Nov 28 23:01:49 localhost.localdomain java[17220]: INFO  [c.c.u.e.CSExceptionErrorCode] (AgentConnectTaskPool-157:ctx-f42983f6) (logid:43503289) Could not find exception: com.cloud.exception.ConnectionException in error code list for exceptions
Nov 28 23:01:49 localhost.localdomain java[17220]: WARN  [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-157:ctx-f42983f6) (logid:43503289) Monitor NetworkOrchestrator says there is an error in the connect process for 1 due to Incorrect Network setup on agent, Reinitialize agent after network names are setup, details : Can not find network: breno1-4000...

But I setup the NIC manually in the agent side.

[root@localhost ~]# ifconfig
breno1-4000: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 52:79:29:5f:17:cd  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

cloud0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 169.254.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 fe80::8c9d:39ff:fe0d:2cc9  prefixlen 64  scopeid 0x20<link>
        ether 8e:9d:39:0d:2c:c9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8  bytes 656 (656.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

cloudbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 198.1.1.202  netmask 255.255.255.248  broadcast 198.1.1.207
        inet6 fe80::225:90ff:fe4b:f86a  prefixlen 64  scopeid 0x20<link>
        ether 00:25:90:4b:f8:6a  txqueuelen 1000  (Ethernet)
        RX packets 121119  bytes 6353671 (6.0 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8679  bytes 1980658 (1.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 00:25:90:4b:f8:6a  txqueuelen 1000  (Ethernet)
        RX packets 413554  bytes 362915220 (346.1 MiB)
        RX errors 0  dropped 36  overruns 0  frame 0
        TX packets 74606  bytes 13284314 (12.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xdf920000-df93ffff

eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 00:25:90:4b:f8:6b  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xdf900000-df91ffff

eno1.4000: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::225:90ff:fe4b:f86a  prefixlen 64  scopeid 0x20<link>
        ether 00:25:90:4b:f8:6a  txqueuelen 1000  (Ethernet)
        RX packets 1256  bytes 57784 (56.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14  bytes 1112 (1.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 6646  bytes 1042584 (1018.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6646  bytes 1042584 (1018.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@localhost ~]# ls /etc/sysconfig/network-scripts/ifcfg-* -l
-rw-r--r--. 1 root root 488 Nov 28 06:45 /etc/sysconfig/network-scripts/ifcfg-breno1-4000
-rw-r--r--. 1 root root 462 Nov 28 06:45 /etc/sysconfig/network-scripts/ifcfg-cloudbr0
-rw-r--r--. 1 root root 408 Nov 28 06:45 /etc/sysconfig/network-scripts/ifcfg-eno1
-rw-r--r--. 1 root root 482 Nov 28 06:45 /etc/sysconfig/network-scripts/ifcfg-eno1.4000
-rw-r--r--. 1 root root 277 Nov 28 04:53 /etc/sysconfig/network-scripts/ifcfg-eno2
-rw-r--r--. 1 root root 254 Mar 29  2019 /etc/sysconfig/network-scripts/ifcfg-lo
wqmeng commented 4 years ago

Hi,

I think my problem is because by the management traffic is tagged with VLAN_ID, while currently cloudstack do not support management traffic with VLAN_ID, I see that in the Physical Network configuration, select management, the Broadcast Domain Type can only be Native, it may not change to VLAN.

Maybe we can get the management traffic work with a VLAN tagged traffic?

When I change the management traffic go to untagged from my switch, everything get back to work.

I will try to test more.

andrijapanicsb commented 4 years ago

That might be the problem, yes.

You can always tag the traffic via the bridge/physical interface itself, and then simply tell ACS to use untagged traffic, while specifying the correct "KVM Traffic Label" which should be the name of your bridge (i.e. cloudbr0 or whatever is the existing name of the bridge) - Explanation: when ACS is using "unttagged" traffic, it will simply stick any VNET interface to whatever bridge you are using/having existing. (FYI if you use i.e. tagged traffic, then ACS would USUALLY extract the child interface of the bridge that is defined via "KVM traffic label" and then create an VLAN interface on top of this child inteface, then create additional bridge and join this vlan interface and any VNET interfaces to this bridge)

Your fix - what you want to do is the following:

On the lower level, you will have your VLAN interface (i.e. eth0.100) joined to the bridge (i.e. cloudbr0) Thus, based on the ACS behaviour to use untagged traffic, ACS willl just "stick" the VNET to the bridge (cloudbr0), while the traffic towards your physical network will be tagged by your existing eth0.100 interface.

That's the "proper" way to do it, and achieve VLAN tagging (if required by your underlying switch setup/infra)

shoaibjdev commented 4 years ago

It seems CloudStack agent module is still broken, have tried 4.11.x.x, 4.12.x.x and 4.13.x.x It fails during addition of Host from the UI.

OS: RHEL 7.7 (Maipo) OS Repo: rhel-7-server-rpms/7Server/x86_64 qemu-kvm: 1.5.3 libvirt: 4.5.0

Tried multiple times with clean Red Hat latest OS installation, subscribed and updated repos still fails at addition of Agent Host with errors below:

Management Server: 2019-12-27 11:49:07,512 WARN [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-2:ctx-266bf136) (logid:284f2ed7) Monitor NetworkOrchestrator says there is an error in the connect process for 1 due to Unable to get an answer to the CheckNetworkCommand from agent: 1

Agent: Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion

Also noted, there is no issue with KVM/QeMU or libvirt as plain virtual machines just works fine, without any errors in libvirtd/kvm logs. After CloudStack Management server tries to update the configuration for Agent host, it messes up something on the libvirt/qemu/kvm configs which am still not able to figure out.

Reproduce:

  1. Install RHEL 7.x from latest ISO
  2. yum update -y
  3. Setup bridge network & interface (as per Doc http://docs.cloudstack.apache.org/en/latest/quickinstallationguide/qig.html)
  4. Setup Primary + Secondary NFS storage on CloudStack management server host
  5. Add CloudStack Repo and install CloudStack management server packages
  6. [KVM/Agent Host] yum install qemu-kvm qemu-img libvirt libvirt-python libvirt-client virt-install bridge-utils -y
  7. Add CloudStack Repo and install CloudStack Agent (Hypervisor) Host
  8. From UI Basic Setup, add the KVM Host and it will fail with above error after trying for few minutes

Can someone from CloudStack community kindly reproduce this and assist to get the root cause?

andrijapanicsb commented 4 years ago

@shoaibjdev you need to post logs, but there is nothing broken per my experience in 4.11+ people are running huge installations. Perhaps it's some environmental issue - that message you shared "invalid connection pointer in virConnectGetVersion" I've seen as well, but my agents were working fine (can't advise why it's there) Please post agent logs during the time of the setup (on pastebin.com or some external service), so we can see. And these kind of discussions are more suitable for the mailing list, I guess.

shoaibjdev commented 4 years ago

@andrijapanicsb Please find the logs below https://we.tl/t-wjmJjUQSQT

NOTE: I was not able to upload on pastebin or similar services due to size limitation. So i used wetransfer which retains the file upto 1 week

andrijapanicsb commented 4 years ago

@shoaibjdev you have added host with IP 192.168.0.20, but once your host was connecting to the mgmt server, it was connecting from 192.168.0.25 IP address ???

2019-12-30 14:28:47,100 DEBUG [o.a.c.c.p.RootCACustomTrustManager] (pool-7-thread-1:null) (logid:) Client/agent connection from ip=192.168.0.25 has been validated and trusted. ... Alternative Names:[[7, 192.168.0.20], [7, fe80:0:0:0:20c:29ff:fec9:aa8a], [7, fe80:0:0:0:20c:29ff:fec9:aa94], [7, 192.168.0.25], [7, 192.168.122.1], [2, 192.168.0.20]] ...

You have network misconfiguration in your environment - if adding a host with ip x.y.z.20 make sure that this host is connecting from that IP to the mgmt server - mgmt server has no idea "who" the host with IP x.y.z.25 is...

andrijapanicsb commented 4 years ago

(and you've tested using ACS 4.13, not 4.12) - topic closed.

shoaibjdev commented 4 years ago

@andrijapanicsb Thank you for the advise, in fact it was environmental issue with the switch, and have been corrected. We managed to setup ACS 4.12 and it works great. This error should be ignored "invalid connection pointer in virConnectGetVersion", doesn't impact ACS setup.

scrollsyou commented 12 months ago

I also encountered this problem, but after checking qemu, I found that the qemu-guest-agent component was missing. Just yum install qemu-guest-agent.