Open sbarbari opened 6 years ago
This issue has not yet been acknowledged. Can anyone provide recommendations?
Can you provide more info about underlying hardware of the system which hosts libvirt machines?
From the logs it explictly displays disk performance issues: apply entries took too long
Also putting k8s, glusterfs and etcd together on 2core/6GB vms on qcow2 will lead to generally poor performance. Things to try:
But first of all, don't use gluster for etcd backend storage.
We have been deploying an etcd cluster (etcd-operator 0.9.2 / etcd 3.2.18) in our project opencord/voltha and have failed to keep a reliable cluster running.
Brief description of the problem
While doing some performance tests with our deployed system, we noticed that the cluster was showing reliability issues when getting to a certain loaded state.
Our test scenario consists in loading our system to 80% of its saturation state. Once the system achieves that state, we delete one component from our system to force a recovery. A new instance is spawned to replace the deleted one and that instance starts retrieving information from the etcd cluster to restore its internal information. At this point, we start seeing some degradation with the etcd cluster. The etcd-operator detects the issue, removes the problematic member and a new etcd instance gets spawned. The etcd instance then tries to restore the data from the previous member. At times, this restore phase may succeed and the cluster comes back to normal, but most times, it will just fail, get terminated and other instances will follow the same path until the operator declares the cluster as being dead.
Reproducing the problem
I simplified my deployment and was able to reproduce the problem by only running the etcd cluster along with a cluster of test containers. The following sections explain how to go about deploying my environment.
Environment
Our system is deployed in a virtual environment hosted on a physical server.
Physical server:
Virtual environment:
Configuration and Installation
Kubernetes cluster installation
The kubernetes cluster was deployed using the voltha-k8s-playground project with some modifications to run in a libvirt based environment.
In order to use the bento box mentioned above, you will need to issue the following instructions
Install some plugins
Download and convert vagrant box
The following Vagrantfile was used
GlusterFS installation
Login
Wipe storage and load module
Install glusterfs client and dependencies
Clone gluster-kubernetes framework
Create a topology
Install GlusterFS server cluster
Setup kubernetes manifests
Create storage class manifest. Update the
resturl
based on what was returned in the previous command.Create cluster role manifests
Create operator manifest
Create etcd cluster manifest
Create etcd tester manifests
Deployment
Start the etcd cluster
Start the etcd tester cluster. The etcd-tester cluster is and etcd client that simulates the leadership functionality of our system.
Execute script to produce 1500 keys in the etcd cluster. This script is executed within and etcd instance.
Enter the etcd instance
Copy the following script in the etcd instance session to execute it
Trigger Cluster Failure
Once the key generation script has completed, you can issue the following command to try and trigger the cluster failure.
Delete one of the etcd-tester instances
When a new etcd-tester instance is spawned to replace the deleted one, it will start loading the data from the etcd cluster. The etcd instances will show a ready status of 0/1 and the logs will show some connection errors. The operator usually re-spawns a new instance, but it will eventually fail. The cluster failure may occur with the first try, or you may need to issue the command once again.
Logs
Etcd Operator
Etcd