Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 12 forks source link

Stresstest ASL Signbank #482

Open Woseseltops opened 6 years ago

Woseseltops commented 6 years ago

Judging from the interest in Instagram, there might be a lot of interest for the ASL Signbank when it launches publically. We better simulate what happens before it actually happens.

Woseseltops commented 5 years ago

I've been looking into this. I've written a simple script that works with Selenium (browser automation), but as noted on many places on the internet this is not optimal because for each test a whole browser has to be opened and closed. As an alternative, I'm looking into Apache JMeter.

Woseseltops commented 5 years ago

Okay I have been working with JMeter for many many hours now, and I've discovered that (1) it is what we need, (2) JMeter is not easy to use, and (3) logging into Django applications from code is HARD: you need the right cookie and the right CRSF token at the right times in the right header, etc.

But I've succeeded! I now have a so called test plan that I can make x virtual users execute spread out within a time frame of y minutes (to prevent that all users do the same action at the exact same time):

All stylesheets and pictures etc are also downloaded for each page... except, for some reason, the actual video.

Something I've noticed so far is that Signbank becomes slower exponentially if you add more users. These are the average page load times of the Dutch Signbank, for n users doing the actions above spread out over 10 seconds. At 7 simultaneous users I'm also getting timeouts, so I've stopped there.

image

Curious for your opinions on this experiment, @susanodd @ocrasborn @vanlummelhuizen

Woseseltops commented 5 years ago

Maybe it's also good to write down the technical details of how I got this working. I'll write it down like it's all simple and straightforward, but it actually took me many hours to figure this out.

What you need to log into a Django application:

  1. Make a GET request to the page with the login form.
  2. Parse the response, and take the CSRF token from the header.
  3. Make a POST request to the login url, and include (1) the CSRF token as a parameter and (2) in the header as if it was coming from a cookie. I'm doing the same thing to a redirect URL I'm getting as a response, but I'm not 100% sure if this is necessary.
  4. Parse the response, and take a new (!!!) session ID from the header.
  5. For all upcoming requests, include the session ID in the header as if it was coming from a cookie.

For all of these requests, it's also important that the 'referer' variable in the header is referring to the previous page.

How to achieve all of this in JMeter:

Woseseltops commented 5 years ago

I did a not-logged in test for ASL Signbank:

Surprisingly, this shows a really different pattern:

image

This created two more questions for which I collected more data:

image

image

But the big question of course is: why is the Global Signbank so much slower? I have three hypotheses:

  1. There is much more data to iterate over in the Global Signbank, making all queries to the database much slower.
  2. We have configured the Global Signbank in such a way that it does not make use of all the hardware it has.
  3. There is some kind of DDOS protection on the RU network.

We could test option 1 by using a temporary, very empty database for the global Signbank, and doing another stresstest.

vanlummelhuizen commented 5 years ago

@Woseseltops Do you have a idea as to where to (stress) test an almost empty database? Do you want to do it in a separate instance on the same server of somewhere else?