CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
111 stars 50 forks source link

[Arktos-Mizar-Integration] case 2 - create new VPC in system tenant, cannot be attached to non system pod #568

Closed Sindica closed 2 years ago

Sindica commented 2 years ago

What happened: Create new VPC1, subnet, tenant a, and pod in tenant a annotated with VPC1, pod cannot be created due to network issue

What you expected to happen: The new pod will be created successfully with IP in VPC1 range.

How to reproduce it (as minimally and precisely as possible):

  1. Setting up arktos env with Mizar integration with code from https://github.com/CentaurusInfra/arktos/tree/poc-2022-01-30, follow step 1,2,3 on docs/setup-guide/arktos-with-mizar-cni.md
  2. Wait till default bouncer to be provisioned
  3. Create new VPC/Subnet/Tenant/Pod one by one using following spec:
apiVersion: mizar.com/v1
kind: Vpc
metadata:
  name: vpc-tenant-ying
spec:
  ip: "22.0.0.0"
  prefix: "16"
  dividers: 1
  status: "Init"
apiVersion: mizar.com/v1
kind: Subnet
metadata:
  name: net-tenant-ying
spec:
  ip: "22.0.0.0"
  prefix: "24"
  bouncers: 1
  vpc: "vpc-tenant-ying"
  status: "Init"
apiVersion: v1
kind: Tenant
metadata:
  name: aaa
spec:
  storageClusterId: "1"
apiVersion: v1
kind: Pod
metadata:
  name: ying-nginx-aaa-1
  tenant: aaa
  annotations:
    mizar.com/vpc: vpc-tenant-ying
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
      - containerPort: 443
---
apiVersion: v1
kind: Pod
metadata:
  name: ying-nginx-aaa-2
  tenant: aaa
  annotations:
    mizar.com/vpc: vpc-tenant-ying
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
      - containerPort: 443
  1. Corresponding VPC/Subnet/divider/bouncer were provisioned, pods are stuck in ContainerCreating phase with kubelet reporting network error:
$ ./cluster/kubectl.sh get pods -o wide -AT
TENANT   NAMESPACE     NAME                               HASHKEY               READY   STATUS              RESTARTS   AGE     IP            NODE             NOMINATED NODE   READINESS GATES
aaa      default       ying-nginx-aaa-1                   4949857460619744911   0/1     ContainerCreating   0          4h23m   <none>        ip-172-30-0-41   <none>           <none>
aaa      default       ying-nginx-aaa-2                   7154573955723019547   0/1     ContainerCreating   0          4h23m   <none>        ip-172-30-0-41   <none>           <none>
aaa      kube-system   coredns-default-75d7fb94bd-jhgtb   416127249812135661    0/1     ContainerCreating   0          4h23m   <none>        ip-172-30-0-41   <none>           <none>
system   a-ns          ying-nginx-1                       5304290718153948242   1/1     Running             0          3h38m   21.0.0.9      ip-172-30-0-41   <none>           <none>
system   a-ns          ying-nginx-2                       3639323320079932476   1/1     Running             0          3h38m   21.0.0.5      ip-172-30-0-41   <none>           <none>
system   default       mizar-daemon-8jzwm                 5975706721484110874   1/1     Running             0          4h29m   172.30.0.41   ip-172-30-0-41   <none>           <none>
system   default       mizar-operator-b445854c4-874mt     8718960087077122294   1/1     Running             0          4h29m   172.30.0.41   ip-172-30-0-41   <none>           <none>
system   default       netpod1                            3921738283652515916   1/1     Running             0          4h24m   20.0.0.30     ip-172-30-0-41   <none>           <none>
system   default       netpod2                            489742900531745830    1/1     Running             0          4h24m   20.0.0.18     ip-172-30-0-41   <none>           <none>
system   default       ying-nginx-1                       3913157028521443041   1/1     Running             0          4h24m   21.0.0.6      ip-172-30-0-41   <none>           <none>
system   default       ying-nginx-2                       8576502358070453327   1/1     Running             0          4h24m   21.0.0.10     ip-172-30-0-41   <none>           <none>
system   kube-system   coredns-default-75d7fb94bd-mtmfn   8778436291382290915   1/1     Running             0          4h29m   20.0.0.22     ip-172-30-0-41   <none>           <none>
system   kube-system   kube-dns-554c5866fc-lpk6n          6595492347156762531   3/3     Running             0          4h29m   20.0.0.6      ip-172-30-0-41   <none>           <none>
system   kube-system   virtlet-dj4m6                      726320756283966583    3/3     Running             0          4h29m   172.30.0.41   ip-172-30-0-41   <none>           <none>
$ cat /tmp/kubelet.log | grep ying-nginx-aaa-1
...
E1118 22:22:43.060480   29937 pod_workers.go:196] Error syncing pod 21da62af-d6bd-4ede-bc3d-d4eaf579998e ("ying-nginx-aaa-1_default_aaa(21da62af-d6bd-4ede-bc3d-d4eaf579998e)"), skipping: failed to "CreatePodSandbox" for "ying-nginx-aaa-1_default_aaa(21da62af-d6bd-4ede-bc3d-d4eaf579998e)" with CreatePodSandboxError: "CreatePodSandbox for pod \"ying-nginx-aaa-1_default_aaa(21da62af-d6bd-4ede-bc3d-d4eaf579998e)\" failed: rpc error: code = Unknown desc = failed to setup network for sandbox \"6553e83934fa8379d99de9e91ed9be9b755670ff5bf85013ee018006f390e9ec\": rpc error: code = DeadlineExceeded desc = Deadline Exceeded"
...
$ cat /tmp/kube-controller-manager.log | grep ying-nginx-aaa-1
...
I1118 22:23:56.132573   29638 mizar-pod-controller.go:170] Entering handling for mizar_pod. key aaa/default/ying-nginx-aaa-1, eventType Create
I1118 22:23:56.133134   29638 util.go:146] Pod Name: ying-nginx-aaa-1, HostIP: 172.30.0.41, Namespace: default, Tenant: aaa, Labels: , Arktos network:
W1118 22:23:56.134246   29638 mizar-pod-controller.go:220] Mizar hit temporary error for mizar_pod. key aaa/default/ying-nginx-aaa-1. Grpc call failed: rpc error: code = Internal desc = Exception deserializing request!, eventType Create
I1118 22:23:56.432590   29638 mizar-pod-controller.go:170] Entering handling for mizar_pod. key aaa/default/ying-nginx-aaa-1, eventType Update
I1118 22:23:56.433226   29638 util.go:146] Pod Name: ying-nginx-aaa-1, HostIP: 172.30.0.41, Namespace: default, Tenant: aaa, Labels: , Arktos network:
W1118 22:23:56.434340   29638 mizar-pod-controller.go:220] Mizar hit temporary error for mizar_pod. key aaa/default/ying-nginx-aaa-1. Grpc call failed: rpc error: code = Internal desc = Exception deserializing request!, eventType Update
...
phudtran commented 2 years ago

Fixed