apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.06k stars 1.1k forks source link

Cannot Add Host, wrong manager IP passed #3909

Closed rvalle closed 4 years ago

rvalle commented 4 years ago

My Manager server is connected to several networks, when adding a host, I can see in the logs that the wrong IP address is been used in the setup agent command

SSH command: cloudstack-setup-agent  -m XXXXXXX

How does the management server determine its own IP?

I have also configured the global management network CIDR with management.network.cidr , yet the IP been passed to the host is the wrong one.

DaanHoogland commented 4 years ago

@rvalle,

The address is it the same that you configured in the global setting 'host'?

And does it resolve to the same address on the hypervisor and the management server? Are any other options added to the command?

rvalle commented 4 years ago

Yes! I can see the setting now. thanks!

rvalle commented 4 years ago

Hi @DaanHoogland

After having properly configuring the host and management.network.cidr I don't get Cloudstack Manager to start.

I am getting the following exceptions when I restart after the changes

2020-03-09 10:50:03,280 INFO  [c.c.c.ClusterManagerImpl] (main:null) (logid:) Start configuring cluster manager : ClusterManagerImpl
2020-03-09 10:50:03,280 INFO  [c.c.c.ClusterManagerImpl] (main:null) (logid:) Cluster node IP : 10.71.0.254
2020-03-09 10:50:03,297 INFO  [c.c.c.ClusterManagerImpl] (main:null) (logid:) Trying to connect to 10.71.0.254
2020-03-09 10:52:10,555 ERROR [c.c.c.ClusterManagerImpl] (main:null) (logid:) Unable to ping management server at 10.71.0.254:9090 due to ConnectException
java.net.ConnectException: Connection timed out
        at sun.nio.ch.Net.connect0(Native Method)
        at sun.nio.ch.Net.connect(Net.java:454)
        at sun.nio.ch.Net.connect(Net.java:446)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:645)
        at com.cloud.cluster.ClusterManagerImpl.pingManagementNode(ClusterManagerImpl.java:1140)
        at com.cloud.cluster.ClusterManagerImpl.pingManagementNode(ClusterManagerImpl.java:1109)
        at com.cloud.cluster.ClusterManagerImpl.checkConflicts(ClusterManagerImpl.java:1187)
....
        at org.apache.cloudstack.ServerDaemon.start(ServerDaemon.java:186)
        at org.apache.cloudstack.ServerDaemon.main(ServerDaemon.java:103)
2020-03-09 10:52:10,565 INFO  [c.c.c.ClusterManagerImpl] (main:null) (logid:) Detected that another management node with the same IP 10.71.0.254 is considered as running in DB, however it is not pingable, we will continue cluster initialization with this management server node
2020-03-09 10:52:10,565 INFO  [c.c.c.ClusterManagerImpl] (main:null) (logid:) Cluster manager is configured.

There seem to be some kind of confusion I think the manager is trying to connect to itself before has finished the startup process.

But then it mentions "another manager".

Any idea what could be going wrong?

Perhaps, in the install process, I should say which is the actual manager IP.

DaanHoogland commented 4 years ago

please check this 2020-03-09 10:52:10,565 INFO [c.c.c.ClusterManagerImpl] (main:null) (logid:) Detected that another management node with the same IP 10.71.0.254 is considered as running in DB, however it is not pingable, we will continue cluster initialization with this management server node it might be remnance from a prior run or the old server might actually still be running.

rvalle commented 4 years ago

@DaanHoogland Yes, I saw it. I actually think that the admin web showed 2 entries for management server even before changing the IP.

Note that I am writing an ansible playbook to install a cloudstack cluster, so, I re-create the whole thing again and again from the scratch.

I don't know who gets to decide how many management servers are there or which one is "me" in the setup process, but seems to get confused by my network setup, as I have several network adapters.

I am assuming that the table mshost is management servers, and I can see only one entry there:

mysql> select id,msid,service_ip,service_port,state from mshost;
+----+-----------------+-------------+--------------+-------+
| id | msid            | service_ip  | service_port | state |
+----+-----------------+-------------+--------------+-------+
|  1 | 209984346422944 | 10.71.0.254 |         9090 | Up    |
+----+-----------------+-------------+--------------+-------+
1 row in set (0.00 sec)

for some reason the management server thinks that that is not "me".
perhaps after modifying the host ip in global config the management server does not shutdown properly when restarting the service. The state should definitely not be UP.

Also, is 9090 the right port? I access the management server on the default 8080 port.

Another question is whether it is possible to launch the setup process in a way that the right IP is chosen as management server, but I cannot see how is that IP selected.

After reading the installation guide I would have thought that this:

[root@manager ~]# ping $(hostname --fqdn)
PING manager.mgmt_net (10.71.0.254) 56(84) bytes of data.
64 bytes from manager.mgmt_net (10.71.0.254): icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from manager.mgmt_net (10.71.0.254): icmp_seq=2 ttl=64 time=0.112 ms

would be enough for the setup to select the right IP for the manager, but perhaps it is not.

Any ideas?

DaanHoogland commented 4 years ago

9090 sounds right 8080 is the web-interface not the service, there are several ports in use and I always (try to) forget which is for what. Can you try to update the state to Down?

rvalle commented 4 years ago

I am testing a bit more, I have a lot of instability. not sure why yet.

rvalle commented 4 years ago

I believe issue #3954 is getting in the way of my testing. Normally I reboot and reapply the roles to create the cluster again (indenpotency test) before concluding the setup of the cluster. I am going to disable it to properly check this.

rvalle commented 4 years ago

@DaanHoogland yes, confirmed. It was #3954 that got my manager broken before attempting to change the host and management.network.cidr global values.

Re-tested without peforming any reboot as part of the clustiner installation process and changed this paramenters with no problem. The management server starts, with the new paramenters, there is only one.... all seems OK to me.

sirdaddits commented 2 years ago

@DaanHoogland yes, confirmed. It was #3954 that got my manager broken before attempting to change the host and management.network.cidr global values. Hello. How can I change managemnet IP? I change it into db.properties but SSVM was create with old IP