apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.83k stars 1.07k forks source link

DNS service not working for System VMs on a fresh new install Cloudstack #7473

Open bradsmin opened 1 year ago

bradsmin commented 1 year ago
ISSUE TYPE
COMPONENT NAME
UI, System VMs
CLOUDSTACK VERSION
CloudStack 4.18
CONFIGURATION

Zone with default Advanced networking with Isolation method as VLAN Zone has only one "Physical Network 1" with all traffic types Guest, Management, Public, Storage passing through it KVM Management Server and KVM Host are same. Its a testing environment OpenVswitch and DPDK enabled and created bridge interface using OpenVswitch commands Primany and Secondary with NFS mount points

OS / ENVIRONMENT

Ubuntu 22.04.2 LTS Codename: jammy

SUMMARY

The System VMs Agent state is not Up

STEPS TO REPRODUCE
Install a fresh new Ubuntu 22.04 LTS server 
Enable OpenVswitch and dpdk over the interface and create bridge interface
Perform the install of cloudstack on a server. Also make this server as KVM Host.
Configure Cloudstack as zone with default advanced networking 
Zone has only one "Physical Network 1" with  all traffic types Guest, Management, Public, Storage 
passing through it
EXPECTED RESULTS
The System VMs Agent state is  OK
ACTUAL RESULTS
After the Cloudstack install and basic configuration of zone, 

Go system vms section, we notice the state is "running" but the  Agent state is not "OK"

Accessed one of the system vm using  commands like  ssh -i /root/.ssh/id_rsa.cloud -p 3922 root@link-local 

Issued command /usr/local/cloud/systemvm/ssvm-check.sh

Noticed error ERROR: DNS not resolving cloudstack.apache.org

Confirmed Google DNS server is there at resolv.conf

Checked the service cloud status and noticed the system vm can't communicate with port 8250 of management server 
over the management server IP

Further troubleshooting shows from inside system vm, only able to ping management server IP 
but no connection with any active ports of management server like ssh port , management ports like 8250 etc

While checking the traffic over the bridge interface using ovs-tcpdump , we have noticed the 
incoming traffic from system vms towards bridge interface are reaching but no outgoing response 
towards system vms. Below is a sample captured traffic at bridge interface over 
cloudstack port number  8250. We believe that we setup the openvswitch dpdk setup correctly 
and not sure what we are missing from our side.

12:08:52.312394 IP SVMIP.33940 > MGMTIP.8250: Flags [S], seq 3467199111, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
<.............E..4\.@.@.....?...?... :..B.........................
12:08:53.314315 IP SVMIP.33940 > MGMTIP..8250: Flags [S], seq 3467199111, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0

issued the command "ovs-ofctl dump-flows cloudbr0" and got below result. Looks like the rule is fine

cookie=0x0, duration=148279.131s, table=0, n_packets=7453119, n_bytes=2575374706, priority=0 actions=NORMAL

Not sure what we are missing.
boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template!

weizhouapache commented 1 year ago

@bradsmin As you described , it looks not like a DNS issue, but firewall issue

Further troubleshooting shows from inside system vm, only able to ping management server IP but no connection with any active ports of management server like ssh port , management ports like 8250 etc

is systemvm able to ping google DNS (8.8.8.8) ?

can you share the agent.properties on kvm host , xml dump of systemvm and output of some ovs commands ?

bradsmin commented 1 year ago

Yes, can ping 8.8.8.8 from inside systemvm but not able to ping google.com

ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=60 time=1.033 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=0.966 ms

ping google.com ping: unknown host

agent.properties content

libvirt.vif.driver=com.cloud.hypervisor.kvm.resource.OvsVifDriver cluster=1 openvswitch.dpdk.enabled=true pod=1 resource=com.cloud.hypervisor.kvm.resource.LibvirtComputingResource private.network.device=cloudbr0 domr.scripts.dir=scripts/network/domr/kvm openvswitch.dpdk.ovs.path=/var/run/openvswitch guest.network.device=cloudbr0 keystore.passphrase= key exists , just for safety removed it here hypervisor.type=kvm port=8250 zone=1 public.network.device=cloudbr0 local.storage.uuid= ID exists, just for safety removed it here host=HOSTIP@static guid= guid exists , just for safety removed it here LibvirtComputingResource.id=1 network.bridge.type=openvswitch workers=5 iscsi.session.cleanup.enabled=false vm.migrate.wait=3600

xml dump attached as file . Only removed IPs, VNC infos, ids, and mac address values. xmldump.txt

Output of command ovs-vsctl show

Bridge cloud0 Port vnet0 Interface vnet0 Port vnet3 Interface vnet3 Port cloud0 Interface cloud0 type: internal Bridge cloudbr0 datapath_type: netdev Port vnet4 Interface vnet4 Port vnet1 Interface vnet1 Port cloudbr0 Interface cloudbr0 type: internal Port eno2 Interface eno2 type: dpdk options: {dpdk-devargs="id"} Port vnet5 Interface vnet5 Port vnet2 Interface vnet2 ovs_version: "2.17.3"

weizhouapache commented 1 year ago

@bradsmin can you test openvswitch without dpdk ?

bradsmin commented 1 year ago

Yes, already done that. With out dpdk, openvswitch and cloudstack is working fine on a fresh install.

li-liwen commented 6 months ago

I have been experiencing the same issue. In my case (I am using linux bridge), I am able to make guest DNS working by disabling UFW on my Ubuntu KVM hosts. However, I haven't figured out a way to make UFW and guest DNS working together for better security.

rohityadavcloud commented 2 months ago

@bradsmin have you defined the internal dns which isn't on your management network? This is because ssvm agent puts a routing rule to route traffic to the internal zone dns via its management/private network nic.

rohityadavcloud commented 2 weeks ago

@bradsmin can you review the comments and advise? Have you also tried @li-liwen 's workaround to disable ufw (or firewalld).

bradsmin commented 2 weeks ago

Yes, internal dns defined ( provided google dns ) at the time of configuration. Also tried to disable ufw firewall at KVM host. But issue still exists.