GoogleCloudPlatform / distributed-load-testing-using-kubernetes

Distributed load testing using Kubernetes on Google Container Engine
http://cloud.google.com/solutions/distributed-load-testing-using-kubernetes
Apache License 2.0
441 stars 258 forks source link

locust master / slave sync #12

Open razghe opened 7 years ago

razghe commented 7 years ago

Hi,

I am using the locust deployment on a kubernetes cluster with 300 slaves. Every time my cluster is crashing, the machines are re-created by kubernetes changing also the IP address of the locust master.

I tried to assign the same IP address to the locust master and re-create it but no slave is syncing back in a locust UI.

What should be the best way to overcome this issue ?

razghe commented 7 years ago

I have tried to deploy the workers in 2 different ways.

One

is through the worker yaml file from the git repo. It creates the pods with error status and none of them is in running state because of the following error message:

/usr/local/bin/locust -f /locust-tasks/tasks.py --host=http://workload-simulation-webapp.appspot.com --slave --master-host=locust-master
[2016-11-25 14:03:08,900] locust-3464956088-f3fw7/ERROR/stderr: Traceback (most recent call last):
[2016-11-25 14:03:08,902] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/bin/locust", line 9, in <module>
[2016-11-25 14:03:08,903] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,904] locust-3464956088-f3fw7/ERROR/stderr: load_entry_point('locustio==0.7.2', 'console_scripts', 'locust')()
[2016-11-25 14:03:08,905] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/lib/python2.7/site-packages/locust/main.py", line 410, in main
[2016-11-25 14:03:08,906] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,907] locust-3464956088-f3fw7/ERROR/stderr: runners.locust_runner = SlaveLocustRunner(locust_classes, options)
[2016-11-25 14:03:08,908] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/lib/python2.7/site-packages/locust/runners.py", line 353, in __init__
[2016-11-25 14:03:08,909] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,910] locust-3464956088-f3fw7/ERROR/stderr: self.client = rpc.Client(self.master_host, self.master_port)
[2016-11-25 14:03:08,911] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/lib/python2.7/site-packages/locust/rpc/zmqrpc.py", line 28, in __init__
[2016-11-25 14:03:08,912] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,913] locust-3464956088-f3fw7/ERROR/stderr: self.receiver.connect("tcp://%s:%i" % (host, port+1))
[2016-11-25 14:03:08,913] locust-3464956088-f3fw7/ERROR/stderr: File "zmq/backend/cython/socket.pyx", line 471, in zmq.backend.cython.socket.Socket.connect (zmq/backend/cython/socket.c:4295)
[2016-11-25 14:03:08,913] locust-3464956088-f3fw7/ERROR/stderr: zmq.error
[2016-11-25 14:03:08,914] locust-3464956088-f3fw7/ERROR/stderr: .
[2016-11-25 14:03:08,914] locust-3464956088-f3fw7/ERROR/stderr: ZMQError
[2016-11-25 14:03:08,914] locust-3464956088-f3fw7/ERROR/stderr: :
[2016-11-25 14:03:08,915] locust-3464956088-f3fw7/ERROR/stderr: Invalid argument
[2016-11-25 14:03:08,915] locust-3464956088-f3fw7/ERROR/stderr: 

Two,

If the workers are deployed manually, containing the direct IP address of the master, for example LOCUST_MASTER=172.20.140.2 (as it is my current IP of the master container), everything is working file

If I deploy the worker with this command:

kubectl run locust --image=gcr.io/cloud-solutions-images/locust-tasks --env LOCUST_MODE=worker --env LOCUST_MASTER=locust-master --env TARGET_HOST=http://workload-simulation-webapp.appspot.com

The kubectl has an error on the status of the worker pod:

host-44-11-1-22:~/pula/distributed-load-testing-using-kubernetes/kubernetes-config # kubectl get pods -o wide
NAME                        READY     STATUS    RESTARTS   AGE       IP             NODE
locust-3464956088-f3fw7     0/1       Error     5          5m        172.20.145.3   razvan-kube-minion0.openstack.local
locust-master-udl8f         1/1       Running   0          41m       172.20.140.2   razvan-kube-minion2.openstack.local
my-nginx-2494149703-6r4dx   1/1       Running   0          1h        172.20.145.2   razvan-kube-minion0.openstack.local
my-nginx-2494149703-gljl2   1/1       Running   0          1h        172.20.52.4    razvan-kube-minion1.openstack.local
host-44-11-1-22:~/pula/distributed-load-testing-using-kubernetes/kubernetes-config # kubectl logs locust-3464956088-f3fw7
/usr/local/bin/locust -f /locust-tasks/tasks.py --host=http://workload-simulation-webapp.appspot.com --slave --master-host=locust-master
[2016-11-25 14:03:08,900] locust-3464956088-f3fw7/ERROR/stderr: Traceback (most recent call last):
[2016-11-25 14:03:08,902] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/bin/locust", line 9, in <module>
[2016-11-25 14:03:08,903] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,904] locust-3464956088-f3fw7/ERROR/stderr: load_entry_point('locustio==0.7.2', 'console_scripts', 'locust')()
[2016-11-25 14:03:08,905] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/lib/python2.7/site-packages/locust/main.py", line 410, in main
[2016-11-25 14:03:08,906] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,907] locust-3464956088-f3fw7/ERROR/stderr: runners.locust_runner = SlaveLocustRunner(locust_classes, options)
[2016-11-25 14:03:08,908] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/lib/python2.7/site-packages/locust/runners.py", line 353, in __init__
[2016-11-25 14:03:08,909] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,910] locust-3464956088-f3fw7/ERROR/stderr: self.client = rpc.Client(self.master_host, self.master_port)
[2016-11-25 14:03:08,911] locust-3464956088-f3fw7/ERROR/stderr: File "/usr/local/lib/python2.7/site-packages/locust/rpc/zmqrpc.py", line 28, in __init__
[2016-11-25 14:03:08,912] locust-3464956088-f3fw7/ERROR/stderr: 
[2016-11-25 14:03:08,913] locust-3464956088-f3fw7/ERROR/stderr: self.receiver.connect("tcp://%s:%i" % (host, port+1))
[2016-11-25 14:03:08,913] locust-3464956088-f3fw7/ERROR/stderr: File "zmq/backend/cython/socket.pyx", line 471, in zmq.backend.cython.socket.Socket.connect (zmq/backend/cython/socket.c:4295)
[2016-11-25 14:03:08,913] locust-3464956088-f3fw7/ERROR/stderr: zmq.error
[2016-11-25 14:03:08,914] locust-3464956088-f3fw7/ERROR/stderr: .
[2016-11-25 14:03:08,914] locust-3464956088-f3fw7/ERROR/stderr: ZMQError
[2016-11-25 14:03:08,914] locust-3464956088-f3fw7/ERROR/stderr: :
[2016-11-25 14:03:08,915] locust-3464956088-f3fw7/ERROR/stderr: Invalid argument
[2016-11-25 14:03:08,915] locust-3464956088-f3fw7/ERROR/stderr: 

When creating the worker with the direct IP address of the master:

kubectl run locust --image=gcr.io/cloud-solutions-images/locust-tasks --env LOCUST_MODE=worker --env LOCUST_MASTER=172.20.140.2 --env TARGET_HOST=http://workload-simulation-webapp.appspot.com

host-44-11-1-22:~/pula/distributed-load-testing-using-kubernetes/kubernetes-config # kubectl get pods -o wide
NAME                        READY     STATUS              RESTARTS   AGE       IP             NODE
locust-2195719602-cnyzj     1/1       Running             0          25m       172.20.140.3   razvan-kube-minion2.openstack.local
locust-master-udl8f         1/1       Running             0          29m       172.20.140.2   razvan-kube-minion2.openstack.local

How can I overcome this issue?

kasturichavan commented 4 years ago

@razghe How were you able to resolve this ? Im facing exactly same problem.

ocervell commented 4 years ago

I have the same issue. Currently I have to deploy the master controller + service first, get the external lb ip of the master and set LOCUST_MASTER to this IP in the slave yaml. It seems like the dns resolution is not working, when putting locust-master for this env variable, the slaves are not recognized. Any information on this ?