Closed er1cthe0ne closed 3 years ago
@Zqy11 - Please provide an update before our next open source meeting. Thanks.
First integration goal in Lab environment:
Zeta setup on server x (ubuntu 18.04LTS + python3 + pip3 + ansible + docker + kind + kubectl):
Follow ACA instructions to setup ACA on another 2 compute nodes
On the 4th node (we use it as controller), write some script to setup Zeta and run tests on the two ACA (initiator/responder):
Sample script code to issue a POST request to Zeta:
response=$(curl -H 'Content-Type: application/json' -X POST \
-d '{"name":"zgc0",
"description":"zgc0",
"ip_start":"20.0.0.1",
"ip_end":"20.0.0.15",
"port_ibo":"8300"}' \
172.16.62.247:8080/zgcs)
I install ACA on a physical machine. But when I run "./build/bin/AlcorControlAgent ", it turns out "Segmentation fault (core dumped)". Will it affects subsequent tests?
@HuaqingTu
Before Eric jumps in, can you provide some additional info: What os is it, ubuntu18? What are the steps you took from beginning? Are you following the ACA build procedure? If it's one of the lab servers, please let us know just the ip of it Thanks,
Bin
@HuaqingTu
Before Eric jumps in, can you provide some additional info: What os is it, ubuntu18? What are the steps you took from beginning? Are you following the ACA build procedure? If it's one of the lab servers, please let us know just the ip of it Thanks,
Bin
in 2 above, you mean 19 and 20,right? On 19 & 20, do you have OVS installed? Running alcor-control-agent and tests Install OVS in ubuntu (18.04) if needed:
If you start a new container, you may need below after installing OVS:
Follow the build and test procedure in getting start guide
@HuaqingTu - after following the getting start guide and setup OVS. Are you able to run ./build/bin/AlcorControlAgent and ./build/tests/aca_tests now?
@HuaqingTu - after following the getting start guide and setup OVS. Are you able to run ./build/bin/AlcorControlAgent and ./build/tests/aca_tests now?
It worked!
When I installe Zeta, I executed "./deploy/ full_deploy.sh-d kind" command, An error occurred while creating k8S cluster with kind. The error message is as follows:
TASK [Setting up Kind cluster] ***********************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "../kind/create_cluster.sh development 2 3 &>> /tmp/ansible_debug.log", "delta": "0:00:00.001796", "end": "2020-11-25 21:29:50.701291", "msg": "non-zero return code", "rc": 126, "start": "2020-11-25 21:29:50.699495", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
can you do this cat /tmp/ansible_debug.log
Hello,the following problem occurred when i install Zeta:
TASK [Deploy zeta-manager service] ************************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "../install/deploy_zeta_manager.sh &>>/tmp/ansible_debug.log", "delta": "0:05:53.159439", "end": "2020-11-27 16:52:04.018989", "msg": "non-zero return code", "rc": 1, "start": "2020-11-27 16:46:10.859550", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
PLAY RECAP ************************************************************************
localhost : ok=2 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
and cat /tmp/ansible_debug.log,log information is as follows:
Deleting cluster "kind" ...
Deleting existing zeta-node containers
Rebuild and publish zeta_node image to localhost:5000...
Rebuild and publish zeta_droplet image to localhost:5000...
Creating zeta-node-1
44ea45b3388c3e97f8b5aa0cd0f64b7deb2ff06515a4f9e416e59e47b525b01b
Creating zeta-node-2
e624b0db694b62e88fbde6e52e684e63a7013292f53886576349c9b059bf2d3f
Creating zeta-node-3
bf83a6223f0a37f546fa5abe718185c5120affb75ea02f82fc2062170003dc45
Creating cluster "kind" ...
• Ensuring node image (localhost:5000/zeta_node:latest) 🖼 ...
✓ Ensuring node image (localhost:5000/zeta_node:latest) 🖼
• Preparing nodes 📦 ...
✓ Preparing nodes 📦
• Writing configuration 📜 ...
✓ Writing configuration 📜
• Starting control-plane 🕹️ ...
✓ Starting control-plane 🕹️
• Installing CNI 🔌 ...
✓ Installing CNI 🔌
• Installing StorageClass 💾 ...
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind --kubeconfig /root/.kube/config.kind
Not sure what to do next? 😅 Check out https://kind.sigs.k8s.io/docs/user/quick-start/
configmap/local-registry-hosting created
Rebuild zeta-operator image...
Rebuild zeta-manager image...
customresourcedefinition.apiextensions.k8s.io/chains.zeta.com created
customresourcedefinition.apiextensions.k8s.io/dfts.zeta.com created
customresourcedefinition.apiextensions.k8s.io/droplets.zeta.com created
customresourcedefinition.apiextensions.k8s.io/ftns.zeta.com created
customresourcedefinition.apiextensions.k8s.io/fwds.zeta.com created
Creating the zeta-operator deployment and pod...
serviceaccount/zeta-operator created
clusterrolebinding.rbac.authorization.k8s.io/zeta-operator created
deployment.apps/zeta-operator created
Creating the zeta-manager deployment and service...
deployment.apps/zeta-manager created
service/zeta-manager created
pod/zeta-manager-8d97bc4dc-cl8r2 condition met
Waiting for postgres service ready for connection......................
...............................................Time out after 300s
Are you using 172.16.62.247, 172.16.62.248? I can't access them. The 249 & 250 seems for ACA only
Yes, I installed Zeta on 247 and 248, but I am not sure whether Zeta is installed or not. After running ./deploy/ full_deploy.sh -d kind, the above mentioned problems were printed out.Using POST http://172.16.62.247:8080/zgcs didn't get response.I copied down some of the information after I ran full_deploy.sh.
host 247 The running container information is as follows:
IMAGE NAMES PORTS
localhost:5000/zeta_node:latest kind-control-plane 0.0.0.0:443->443/tcp, 0.0.0.0:8080->80/tcp, 127.0.0.1:45417->6443/tcp
localhost:5000/zeta_droplet:latest zeta-node-3
localhost:5000/zeta_droplet:latest zeta-node-2
localhost:5000/zeta_droplet:latest zeta-node-1
registry:2 local-kind-registry 0.0.0.0:5000->5000/tcp
zeta_build:latest zb
host 247 images information:
REPOSITORY TAG SIZE
localhost:5000/zeta_opr latest 1.11GB
localhost:5000/zeta_droplet latest 1.98GB
localhost:5000/zeta_node latest 1.75GB
localhost:5000/zeta_manager latest 247MB
zeta_build latest 1.92GB
fwnetworking/zeta_dev latest 1.92GB
running lsof -i:8080 output:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 452036 root 4u IPv6 6083581 0t0 TCP *:http-alt (LISTEN)
I think the installation script should run to deploy_zeta_manager.sh, stopped below:
REGISTRY="$REG" \
envsubst '$REGISTRY' < $DEPLOYMENTS_PATH/zeta-manager-deployment.yml > $DEPLOYMENTS_PATH/.zeta-manager-deployment.yml
kubectl apply -f $DEPLOYMENTS_PATH/.zeta-manager-deployment.yml
kubectl apply -f $DEPLOYMENTS_PATH/zeta-manager-service.yml
kubectl wait --for=condition=ready pod -l app=zeta-manager --timeout=300s
echo -n "Waiting for postgres service ready for connection..."
POD_ZM="$(kubectl get pod --field-selector status.phase=Running -l app=zeta-manager -o jsonpath='{.items[0].metadata.name}')"
end=$((SECONDS + 300))
ready="Not Ready"
while [[ $SECONDS -lt $end ]]; do
ready="$(kubectl exec $POD_ZM -- cat /tmp/healthy 2>&1 | head -n1)"
if [ -z "$ready" ]; then
ready="ready"
break
fi
echo -n "."
sleep 2
done
if [ "$ready" != "ready" ]; then
echo "Time out after 300s"
exit 1
fi
So what should I do next?Or how do I make sure that Zeta is installed?
Please check why the remote connection to 247 and 248 not working, I need to ssh onto these two: 39.98.115.249:8247 39.98.115.249:8248 ssh access not working according to the instruction sent to me before please don't send username/password here, if changed, send me through email. Also, the problem seems on 39.98.115.249, connection to 8247, 8248 are rejected.
Sorry, some of my wrong operations caused the remote login failure, now 8247 should be restored, 8248 is still not available
@PikaPikaW There are a few issues in 247 environment:
default pod/postgres-7875689b5-q4cpz 0/1 ContainerCreating 0 8m55s default pod/zeta-manager-8d97bc4dc-qlkql 0/1 ContainerCreating 0 8m48s Is this related to the issue on downloading from mirror site?
I did a manual image pull for postgres, it's super slow, the deployment will certainly fail. Since all Zeta services are locally built, they will load fast but the postgres and ingress-nginx need to be accelerated: maybe manually pull them then push to local registry (existing localhost:5000). Then the yaml files needs to be modified to point image to where you pushed the two images, see: deploy/install/deploy_postgres.sh and deploy_ingress_nginx.sh
ok,Therefore, I need to pull and push these images to the Local Registry locally, then change the script to use the image of local push, and cannot use root to build and deploy.
Some doubt: What specific images do I need to pull locally?I found postgres:12.1-alpine, but another one wasn't found in the deploy_ingress_nginx.sh script. There is a website,but I can't open it and do I need VPN?
# deploy_ingress_nginx.sh
echo "Create Nginx Ingress Controller..."
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml >/dev/null
# Remove unnecessary validation through webhooks when new ingresses are added
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission &>/dev/null
Is the image of Zeta Service already pushed to the local Registry, so it is loaded quickly? Can I lengthen the 300s to ensure that the image is loaded successfully ? I don't know how to change the YAML file, so I wonder if it is ok to change this time?
I found the mirror site for above yaml (from here https://blog.csdn.net/networken/article/details/105122778): https://raw.sevencdn.com/kubernetes/ingress-nginx/master/deploy/static/provider/kind/deploy.yaml
You can try extend the time out in deploy_zeta_manager.sh but that may not address the issue if deployment yaml or image is not accessible.
hello,in deploy_postgres.sh, all YML files used found only one image postgres:12.1-alpine need pull in Postgres-Deployment, so I pull image postgres locally, tag it localhost:5000/postgres, and push so that it should pull locally, And in postgres-deployment.yml ,change the image point to localhost:5000/postgres:latest
spec:
containers:
- name: postgres
image: localhost:5000/postgres:latest
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: user
- name: POSTGRES_PASSWORD
valueFrom:
Similarly, deploy_ingress_nginx. Sh, I put the url yml file stored in the local ./deploy/etc/deployments, named ingress-nginx-deployment.yml, then register locally the images the file need pull.As follows:
k8s.gcr.io/ingress-nginx/controller:v0.41.2@sha256:1f4f402b9c14f3ae92b11ada1dfe9893a88f0faeb0b2f4b903e2c67a0c3bf0de
registered as
localhost:5000/ingress-nginx-controller:latest
docker.io/jettech/kube-webhook-certgen:v1.5.0
registered as
localhost:5000/kube-webhook-certgen:latest
Then change the image point to in the file But at the end,the installation still print timeout. I don't know if I'm doing this right?
I checked 247, the problem is only postgres now, the pod is not there because there is a small error in postgres-deployment.yml: the indent of "containers" was changed wrong causing:
sdn@computer17:~/Zeta/zeta$ ./deploy/install/deploy_postgres.sh Creating the volume... persistentvolume/postgres-pv unchanged persistentvolumeclaim/postgres-pvc unchanged Creating the database credentials... secret/postgres-credentials unchanged Creating the postgres deployment and service... error: error parsing /home/sdn/Zeta/zeta/deploy/install/../etc/deployments/postgres-deployment.yml: error converting YAML to JSON: yaml: line 28: did not find expected key service/postgres unchanged
I fixed this part and deploy again, zeta-manager still not up, check the log shows:
sdn@computer17:~/Zeta/zeta$ kubectl logs zeta-manager-8d97bc4dc-gc2r4 standard_init_linux.go:211: exec user process caused "no such file or directory"
Since we never hit such kind of error, I checked the diff in your repo and noticed all files were modified with windows-style line/file ending. I will check which one caused the problem
Seems all files are affected by windows style line ending, I fixed with
find . -type f -print0 | xargs -0 dos2unix --
and did a full deploy, it deploys successfully now, you can access zeta NBI API through port 8080:
sdn@computer17:~/Zeta/zeta$ curl http://localhost:8080/zgcs [ { "description": "zgc0", "id": 1, "ip_end": "20.0.0.255", "ip_start": "20.0.0.0", "name": "zgc0", "nodes": [], "overlay_type": "vxlan", "port_ibo": 8300, "vpcs": [], "zgc_id": "5b2e21d3-9418-4468-8d51-c513861bfdf5" } ]
So mainly in 247 there are three issues deploying in your environment:
Now that Zeta has been installed and Zeta's interface is available, I think I need to read ACA gtest and learn how RPC works. Is there any RPC script that has been written to access ACA?Can you send a link,please
@PikaPikaW - great progress to have Zeta installed. For ACA gtest, you can take a look at /test/gtest/aca_test_ovs_l2.cpp, DISABLED_2_ports_CREATE_test_traffic_PARENT and DISABLED_2_ports_CREATE_test_traffic_CHILD to see how to setup the goal state and do traffic testing. Execution instruction is on top of the file or https://github.com/futurewei-cloud/alcor-control-agent/wiki/How-to-run-the-full-suite-of-aca_tests @zhangml started modifying and running the gtest already.
current documentation has been merged with #173
As we are moving into the next phrase of the project. We need to design and setup an environment for Zeta+ACA validation. The request is to create a document which includes the below:
The plan is to have automated test running in this environment based on the current ACA testing framework.