Improve Performance for Simultaneous Users and Average Response Time

kaladay commented 1 month ago

Image performance right now is pretty slow compared to other institutions using Cantaloupe. Ideally, I think we should be targeting minimally:

10 requests per second from 12 users (a little more than a request per second)
The slowest requests taking less than 40 seconds to respond

Cantaloupe handles various tasks, but my primary focus is on image tiling and initial image responses. This is crucial because many of our images, such as maps, first load a lower-resolution version of the full image. The system then retrieves high-resolution tiles as the user zooms in on specific regions of the initial image.

To aid in benchmarking a load testing this problem, I added to some work done by a colleague at another institution here:

https://github.com/markpbaggett/iiif-loadtest

This uses Locust to mock generating "high traffic" to a web application to see how it handles many simultaneous users. Locust allows you to define different user behaviors, simulate them, and test the performance of your web application. Two scenarios in the code above have been written to mimic what a client viewer or user might do in a client viewer: virtualReading and zoomToPoint. Essentially, they process the image api response and then ask Cantaloupe to do with an image viewer would do when certain regions of the canvas are requested.

Using the code above and these scenarios, we can pass it a list of images from one of our exhibits. I've attached 5 different lists of images from SAGE collections. If useful, here is a list of all images in SAGE collections.

Using one of the lists, we can run the scenarios like so:

locust -f imagesrv/locustfile.py --url-list sage_4.txt -H https://api.library.tamu.edu/ --log-file sage_4.log --log-level WARNING --tasks virtualReading,zoomToPoint -u 12

This will tell locust to randomly grab images from the sage_4.txt file attached and mimic 12 users zooming in and out on these images. It will also log any long responses or failures to sage_4.log.

I've attached here a report showing Cantaloupe struggling to do this for all the attached lists.

After this work is complete, I'd like for our image server to be able to process at least 10 requests per second and not fail with 500 timeouts so much as you can see in the attached reports (.html files).

Attachments

SAGE images in batches of 5000
SAGE log for slow requests
Locust Reports after testing scenarios for 10 minutes

Acceptance Criteria

Cantaloupe can support 10 simultaneous users.
The slowest requests take less than 40 seconds to respond. (I think we could live with 60 seconds)
120 plus second / 5XX responses for timeouts are eliminated

kaladay commented 1 month ago

Some base-line tests using DEV and locust.

(I renamed extension to .txt to make github happy.)

graph-8315ce91-097c-32b7-9958-1acae46553ff-dev-2024_09_23-1.html.txt graph-eac40814-fa1c-357d-890d-767a0c325cb2-dev-2024_09_23-1.html.txt locustfile-8315ce91-097c-32b7-9958-1acae46553ff-dev-2024_09_23-1.py.txt locustfile-eac40814-fa1c-357d-890d-767a0c325cb2-dev-2024_09_23-1.py.txt

kaladay commented 1 month ago

Some additional tests, using IRIIIF and some bypassing IRIIIF directly using Fedora.

Only a single image is tested using multiple methods. The files named simple are using less complex, aka, simple hardcoded URLs. The files named complex are using more complex dynamically randomly generated URLs for that one image to perform different kinds of transforms.

graph-eac40814-fa1c-357d-890d-767a0c325cb2-dev-2024_09_25-no_cache-fedora-complex-1.html.txt graph-eac40814-fa1c-357d-890d-767a0c325cb2-dev-2024_09_25-no_cache-fedora-simple-1.html.txt graph-eac40814-fa1c-357d-890d-767a0c325cb2-dev-2024_09_25-no_cache-ir-complex-1.html.txt graph-eac40814-fa1c-357d-890d-767a0c325cb2-dev-2024_09_25-no_cache-ir-simple-1.html.txt

TAMULib / cantaloupe

Improve Performance for Simultaneous Users and Average Response Time #3

Attachments

Acceptance Criteria