CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
112 stars 50 forks source link

Define way to expose VM workload to users #614

Open yb01 opened 2 years ago

yb01 commented 2 years ago

What happened:

in the case below, this is the veth for the VM POD

veth-a1bfdb1a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000

root@ip-172-31-39-83:~/go/src/k8s.io/arktos# kubectl get pods vmdefault -o wide
NAME        HASHKEY               READY   STATUS    RESTARTS   AGE   IP          NODE              NOMINATED NODE   READINESS GATES
vmdefault   3885105893249453356   1/1     Running   0          24s   21.0.21.5   ip-172-31-39-83   <none>           <none>
root@ip-172-31-39-83:~/go/src/k8s.io/arktos# ssh cirros@21.0.21.5
ssh: connect to host 21.0.21.5 port 22: Connection timed out
root@ip-172-31-39-83:~/go/src/k8s.io/arktos# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:12:ec:ca:59  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 172.31.39.83  netmask 255.255.240.0  broadcast 172.31.47.255
        inet6 fe80::4a5:99ff:fe3e:79d  prefixlen 64  scopeid 0x20<link>
        ether 06:a5:99:3e:07:9d  txqueuelen 1000  (Ethernet)
        RX packets 48858  bytes 68788955 (68.7 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 8245  bytes 769594 (769.5 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth-hostep: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.31.39.83  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::f055:52ff:fedd:e4ae  prefixlen 64  scopeid 0x20<link>
        ether f2:55:52:dd:e4:ae  txqueuelen 1000  (Ethernet)
        RX packets 12  bytes 936 (936.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 936 (936.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 177212  bytes 41143162 (41.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 177212  bytes 41143162 (41.1 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth-5a235ada: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet6 fe80::dc98:fff:fe3d:7558  prefixlen 64  scopeid 0x20<link>
        ether de:98:0f:3d:75:58  txqueuelen 1000  (Ethernet)
        RX packets 103  bytes 7504 (7.5 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 17  bytes 1146 (1.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth-a1bfdb1a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet6 fe80::c8b9:bdff:fe49:351  prefixlen 64  scopeid 0x20<link>
        ether ca:b9:bd:49:03:51  txqueuelen 1000  (Ethernet)
        RX packets 58  bytes 3908 (3.9 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 33  bytes 1838 (1.8 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth-e71105cc: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet6 fe80::5047:caff:fe5e:54ef  prefixlen 64  scopeid 0x20<link>
        ether 52:47:ca:5e:54:ef  txqueuelen 1000  (Ethernet)
        RX packets 404  bytes 28220 (28.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19  bytes 1230 (1.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth-hostep: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet6 fe80::5048:94ff:fee9:e7b4  prefixlen 64  scopeid 0x20<link>
        ether 52:48:94:e9:e7:b4  txqueuelen 1000  (Ethernet)
        RX packets 12  bytes 936 (936.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 936 (936.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@ip-172-31-39-83:~/go/src/k8s.io/arktos# 

What you expected to happen:

like with the bridge cni, the IP should be accessiable

root@ip-172-31-39-83:~/go/src/k8s.io/arktos# kubectl get pods vmdefault -o wide
NAME        HASHKEY               READY   STATUS    RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATES
vmdefault   5566664455589313951   1/1     Running   0          16s   10.88.0.11   ip-172-31-39-83   <none>           <none>
root@ip-172-31-39-83:~/go/src/k8s.io/arktos# ssh cirros@10.88.0.11
The authenticity of host '10.88.0.11 (10.88.0.11)' can't be established.
ECDSA key fingerprint is SHA256:xlFkonzYp308uzMA+oEirugxa8FGPirTgPQuIM63vq4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.88.0.11' (ECDSA) to the list of known hosts.
cirros@10.88.0.11's password: 
$ 

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: Here the qemu log:

root@ip-172-31-39-83:/var/log/libvirt/qemu# cat arktosRT--71afacb0-d3f9-vm.log 
2022-02-04 05:58:38.035+0000: starting up libvirt version: 6.5.0, qemu version: 4.0.0, kernel: 5.6.0-rc2, hostname: ip-172-31-39-83
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
HOME=/var/lib/libvirt/qemu/domain-1-arktosRT--71afacb0-d \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-arktosRT--71afacb0-d/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-arktosRT--71afacb0-d/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-arktosRT--71afacb0-d/.config \
QEMU_AUDIO_DRV=none \
VIRTLET_EMULATOR=/usr/local/bin/qemu-system-x86_64 \
VIRTLET_NET_KEY=5249bc16-6c53-4f24-81fe-a41d4796c37c \
VIRTLET_CONTAINER_ID=71afacb0-d3f9-5171-5218-c0532a1160f7 \
VIRTLET_CONTAINER_LOG_PATH=/var/log/pods/system_default_vmdefault_5249bc16-6c53-4f24-81fe-a41d4796c37c/vm/0.log \
CGROUP_PARENT=/kubepods/pod5249bc16-6c53-4f24-81fe-a41d4796c37c/71afacb0-d3f9-5171-5218-c0532a1160f7 \
/vmwrapper \
-name guest=arktosRT--71afacb0-d3f9-vm,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-arktosRT--71afacb0-d/master-key.aes \
-machine pc-i440fx-4.0,accel=tcg,usb=off,dump-guest-core=off \
-cpu EPYC,acpi=on,ss=on,hypervisor=on,erms=on,mpx=on,pcommit=on,clwb=on,pku=on,la57=on,3dnowext=on,3dnow=on,npt=on,vme=off,fma=off,avx=off,f16c=off,rdrand=off,avx2=off,rdseed=off,sha-ni=off,xsavec=off,fxsr_opt=off,misalignsse=off,3dnowprefetch=off,osvw=off,topoext=off,nrip-save=off \
-m size=1048576k,slots=16,maxmem=2097152k \
-overcommit mem-lock=off \
-smp 1,maxcpus=2,sockets=2,cores=1,threads=1 \
-numa node,nodeid=0,cpus=0-1,mem=1024 \
-uuid 71afacb0-d3f9-5171-5218-c0532a1160f7 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=22,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 \
-drive file=/var/lib/virtlet/volumes/virtlet_root_71afacb0-d3f9-5171-5218-c0532a1160f7,format=qcow2,if=none,id=drive-scsi0-0-0-0 \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
-drive file=/var/lib/virtlet/config/config-71afacb0-d3f9-5171-5218-c0532a1160f7.iso,format=raw,if=none,id=drive-scsi0-0-0-1,readonly=on \
-device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,device_id=drive-scsi0-0-0-1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 \
-chardev socket,id=charserial0,path=/var/lib/libvirt/streamer.sock,reconnect=1 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 127.0.0.1:0 \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2022-02-04 05:58:38.035+0000: Domain id=1 is tainted: custom-argv
I0204 05:58:38.052155   10120 vmwrapper.go:67] Obtaining PID of the VM container process...
W0204 05:58:38.052973   10120 vmwrapper.go:93] POD cgroupParent /kubepods/pod5249bc16-6c53-4f24-81fe-a41d4796c37c/71afacb0-d3f9-5171-5218-c0532a1160f7 for controller  does not exist
W0204 05:58:38.053001   10120 vmwrapper.go:100] Failed to move pid into cgroup "" path /: open /sys/fs/cgroup/cgroup.procs: read-only file system
W0204 05:58:38.053418   10120 vmwrapper.go:93] POD cgroupParent /kubepods/pod5249bc16-6c53-4f24-81fe-a41d4796c37c/71afacb0-d3f9-5171-5218-c0532a1160f7 for controller rdma does not exist
nsfix reexec: pid 10120: entering the namespaces of target pid 27334
nsfix reexec: dropping privs
root@ip-172-31-39-83:/var/log/libvirt/qemu# 

Environment:

yb01 commented 2 years ago

Actually when I thought it a bit more. This behavior might be by-designed.

The difference is that Mizar is different from the “traditional” CNI to have a flat container networking mechanism so that all pods can be access at the host level. With Mizar, the pod is restrained in the VPC/subnet boundary and the boundary cannot be accessed from the host.

With that, I tried to deploy a container pod under the same tenant/namespace and the VM can be accessed from the POD where they are under the same VPC/subnet. As shown below:

So what we need think of with mizar are, this probably are not blockers to 130 release.

  1. Plan to expose workload ( VM ) with pubic IPs ( floating IP as Openstack ) so they can be accessed outside the cluster
  2. Some services that expose the VM pods so they can be accessed via the service from outside the cluster.
  3. As shown below, have sort of container pod paired up with VM pod, and access VM from there.
root@ip-172-31-39-83:~/go/src/k8s.io/arktos# crictl ps
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
WARN[0002] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
ERRO[0004] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
1c2a04f28d013       bfe3a36ebd252       15 seconds ago      Running             coredns             1                   e06658643c73f
744fe34a086d9       4d816efab7b24       41 seconds ago      Running             netctr              0                   18f0e88558c40
3ef846d16c39a       4b2e93f0133d3       2 minutes ago       Running             sidecar             0                   4dc385a408dc7
a41d422057b58       6dc8ef8287d38       2 minutes ago       Running             dnsmasq             0                   4dc385a408dc7
2366d3f80fa65       ebfc28c4ed971       2 minutes ago       Running             mizar-daemon        0                   984c2fba29c0c
ae2d1d17a79e1       6c1b05c02f906       3 minutes ago       Running             vms                 0                   ec2a5750101a5
9ed9d92a776b7       6c1b05c02f906       3 minutes ago       Running             virtlet             0                   ec2a5750101a5
240f4f71c2e27       6c1b05c02f906       3 minutes ago       Running             libvirt             0                   ec2a5750101a5
638614aed03f1       74613191ee383       3 minutes ago       Running             mizar-operator      0                   fc5b957e89ea6

root@ip-172-31-39-83:~/go/src/k8s.io/arktos# crictl exec -it 744fe34a086d9 /bin/bash
WARN[0000] runtime connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
ERRO[0002] connect endpoint 'unix:///var/run/dockershim.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded 
root@netpod1-1:/# ping 21.0.21.4
PING 21.0.21.4 (21.0.21.4) 56(84) bytes of data.
64 bytes from 21.0.21.4: icmp_seq=1 ttl=64 time=3.76 ms
64 bytes from 21.0.21.4: icmp_seq=2 ttl=64 time=0.610 ms
64 bytes from 21.0.21.4: icmp_seq=3 ttl=64 time=0.550 ms
^C
--- 21.0.21.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2015ms
rtt min/avg/max/mdev = 0.550/1.641/3.763/1.500 ms
root@netpod1-1:/# ssh [cirros@21.0.21.4](mailto:cirros@21.0.21.4)
The authenticity of host '21.0.21.4 (21.0.21.4)' can't be established.
ECDSA key fingerprint is SHA256:sya8/VYwhvSG9TqglyTbHcve5Wo40qWz2OLgcmVoTBY.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '21.0.21.4' (ECDSA) to the list of known hosts.
[cirros@21.0.21.4's](mailto:cirros@21.0.21.4's) password: 
Permission denied, please try again.
[cirros@21.0.21.4's](mailto:cirros@21.0.21.4's) password: 
$
vinaykul commented 2 years ago

Not 1/30 release blocker.

vinaykul commented 2 years ago

@yb01 Can you please retest? I believe with Phu's latest change that creates a virtual interface and static route for the system-default and user VPCs, you should be able to access the VM as long as there's no IP collision.

Note: In the bridge CNI case, you don't have the concept of VPC isolation that Mizar provides so it works as you have a flat network. JMHO.

vinaykul commented 2 years ago

Assigned to @yb01 to retest with latest POC code. Try accessing VM pod from api master vm.

yb01 commented 2 years ago

tried the latest mizar build on one box, which is master and worker at the same node. still not able to ssh to the vm as it expected to be able to access VM pod from masters.

yb01 commented 2 years ago

punt to pose 130 release. for now one have to use option 3 till we have a services functioning and verified with option 2.