apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.11k stars 1.11k forks source link

SSVM/CPVM not starting due to failed SSL Handshake #9959

Closed devops-42 closed 4 days ago

devops-42 commented 5 days ago
ISSUE TYPE
COMPONENT NAME
API / Backend
CLOUDSTACK VERSION
4.19.1.x
CONFIGURATION

Issue was with both: basic and advanced networking. The configuration is shown below.

OS / ENVIRONMENT

Used setup: Management + KVM host:

Management:

KVM host:

SUMMARY

When setting up a zone (basic or advanced) the KVM host has joined to the cluster, but the SSVM and the CPVM stuck in "Starting". The log file of the SSVM shows a SSL error:

2024-11-20 23:54:22,618 WARN  [cloud.agent.Agent] (main:null) NIO Connection Exception  com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: **.**.**.** port: 8250

The same log indicates, that the cloud agent on the SSVM was not able to detect the keystore:

2024-11-20 23:54:21,927 WARN  [utils.nio.Link] (main:null) Failed to load keystore, using trust all manager

After playing around I found out, that the cloud agent expects to have a keystore cloud.jks in the /usr/local/cloud/systemvm/conf directory, which is populated from the /etc/cloudstack directory. Unfortunately, /etc/cloudstack is empty on the VM.

Already tried to work around by setting the global configuration parameter ca.plugin.root.auth.strictness to false (not really working for me, but with unexpected results):

STEPS TO REPRODUCE

Setup management server:

apt-get install -y \
  apt-transport-https \
  bridge-utils \
  ca-certificates \
  curl \
  chrony \
  gnupg \
  lsb-release \
  mysql-server \
  net-tools \
  nfs-kernel-server \
  quota \
  software-properties-common \
  unattended-upgrades

cat <<'EOF' > /etc/mysql/mysql.conf.d/cloudstack.cnf
[mysqld]
server-id=1
innodb_rollback_on_timeout=1
innodb_lock_wait_timeout=600
max_connections=350
log-bin=mysql-bin
binlog-format = 'ROW'
EOF
systemctl restart mysql

wget -O - https://download.cloudstack.org/release.asc | tee /etc/apt/trusted.gpg.d/cloudstack.asc
echo "deb https://download.cloudstack.org/ubuntu noble 4.19" | tee /etc/apt/sources.list.d/cloudstack.list
apt-get update
apt-get install -y cloudstack-management

mkdir -p /export/primary /export/secondary
echo "/export  *(rw,async,no_root_squash,no_subtree_check,insecure)" >> /etc/exports
exportfs -a
sed -i -e 's/^RPCMOUNTDOPTS="--manage-gids"$/RPCMOUNTDOPTS="-p 892 --manage-gids"/g' /etc/default/nfs-kernel-server
sed -i -e 's/^STATDOPTS=$/STATDOPTS="--port 662 --outgoing-port 2020"/g' /etc/default/nfs-common
echo "NEED_STATD=yes" >> /etc/default/nfs-common
sed -i -e 's/^RPCRQUOTADOPTS=$/RPCRQUOTADOPTS="-p 875"/g' /etc/default/quota
service nfs-kernel-server restart

cloudstack-setup-databases ***:***@localhost --deploy-as=root -i 127.0.0.1
cloudstack-setup-management

Setup KVM host:

apt-get install -y \
  apt-transport-https \
  bridge-utils \
  ca-certificates \
  curl \
  chrony \
  gnupg \
  lsb-release \
  net-tools \
  quota \
  software-properties-common \
  unattended-upgrades

cat <<'EOM' > /etc/netplan/01-netcfg.yaml
network:
  version: 2
  ethernets:
    eth0: {}
  bridges:
    cloudbr0:
      addresses:
        - **.**.**.**/**
      nameservers:
        addresses:
          - **.**.**.**
      routes:
        - to: default
          via: **.**.**.**
          metric: 100
      interfaces: [eth0]
EOM
chmod 600 /etc/netplan/01-netcfg.yaml
mv /etc/netplan/50-cloud-init.yaml /etc/netplan/50-cloud-init.yaml.dist
netplan generate && netplan apply

wget -O - https://download.cloudstack.org/release.asc | tee /etc/apt/trusted.gpg.d/cloudstack.asc
echo "deb https://download.cloudstack.org/ubuntu noble 4.19" | tee /etc/apt/sources.list.d/cloudstack.list
apt-get update
apt-get install -y qemu-kvm cloudstack-agent

sed -i -e 's/\#vnc_listen.*$/vnc_listen = "0.0.0.0"/g' /etc/libvirt/qemu.conf
systemctl mask libvirtd.socket libvirtd-ro.socket libvirtd-admin.socket libvirtd-tls.socket libvirtd-tcp.socket
systemctl restart libvirtd

mv /etc/libvirt/libvirtd.conf /etc/libvirt/libvirtd.conf.dist
cat <<'EOM' > /etc/libvirt/libvirtd.conf
listen_tls=0
listen_tcp=0
tcp_port = "16509"
mdns_adv = 0
auth_tcp = "none"
EOM

systemctl restart libvirtd

modprobe br_netfilter
echo 'net.bridge.bridge-nf-call-arptables = 0' >> /etc/sysctl.conf
echo 'net.bridge.bridge-nf-call-iptables = 0' >> /etc/sysctl.conf
echo 'net.bridge.bridge-nf-call-ip6tables = 0' >> /etc/sysctl.conf
sysctl -p
EXPECTED RESULTS

SSVM and CPVM starting up, cloud agent is running. Creation of compute instances using virtual router (isolated guest network) is possible.

ACTUAL RESULTS

Here the (hopefully) relevant log snippet.

2024-11-20 23:54:21,734 INFO  [cloud.agent.Agent] (main:null) Agent [id = new : type = PremiumSecondaryStorageResource : zone = 1 : pod = 1 : workers = 5 : host = **.**.**.** : port = 8250
2024-11-20 23:54:21,809 INFO  [utils.nio.NioClient] (main:null) Connecting to **.**.**.**:8250
2024-11-20 23:54:21,828 INFO  [utils.nio.Link] (main:null) Conf file found: /usr/local/cloud/systemvm/conf/agent.properties
2024-11-20 23:54:21,927 WARN  [utils.nio.Link] (main:null) Failed to load keystore, using trust all manager
2024-11-20 23:54:22,597 ERROR [utils.nio.Link] (main:null) SSL error caught during unwrap data: Received fatal alert: bad_certificate, for local address=/**.**.**.**:43322, remote address=/**.**.**.**:8250. The client may have invalid ca-certificates.
2024-11-20 23:54:22,602 ERROR [utils.nio.NioClient] (main:null) SSL Handshake failed while connecting to host: **.**.**.** port: 8250
2024-11-20 23:54:22,604 ERROR [utils.nio.NioConnection] (main:null) Unable to initialize the threads.
java.io.IOException: SSL Handshake failed while connecting to host: **.**.**.** port: 8250
        at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
        at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95)
        at com.cloud.agent.Agent.start(Agent.java:286)
        at com.cloud.agent.AgentShell.launchNewAgent(AgentShell.java:454)
        at com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:431)
        at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:415)
        at com.cloud.agent.AgentShell.start(AgentShell.java:511)
        at com.cloud.agent.AgentShell.main(AgentShell.java:541)
2024-11-20 23:54:22,618 WARN  [cloud.agent.Agent] (main:null) NIO Connection Exception  com.cloud.utils.exception.NioConnectionException: SSL Handshake failed while connecting to host: **.**.**.** port: 8250
2024-11-20 23:54:22,618 INFO  [cloud.agent.Agent] (main:null) Attempted to connect to the server, but received an unexpected exception, trying again...

Thanks for looking at it✌️

boring-cyborg[bot] commented 5 days ago

Thanks for opening your first issue here! Be sure to follow the issue template!

devops-42 commented 4 days ago

Update: It turned out that the ping utility was missing on the host machine. After installation, everything works as expected. Sorry for the noise 😇