OregonDigital / OD2

Next generation of Oregon Digital ( https://oregondigital.org ) digital collections platform, built on Samvera Hyrax ( https://github.com/samvera/hyrax/ )
18 stars 1 forks source link

Load test the stack #930

Open jechols opened 4 years ago

jechols commented 4 years ago

Descriptive summary

We have no idea how the new stack will perform under stress. The tech has changed significantly since the days of our original CDM -> Hydra migration, we have more data, and we're planning to promote OD a lot more than we did upon the Hydra-based OD release.

We should get a good deal of data in before we do this, because the new stack seems to add a lot more overhead per item than the current stack. This could mean that a load test against 100 items works out just fine while 100,000 items falls over.

By the time we get 100,000 items we're probably mostly done with the migration, and we want data sooner than that. I suggest a nice middle-ground test of around 10,000 items where we simulate 20-30 simultaneous users, such as what we might see from a smallish class using the site for something.

Expected behavior

I don't know that we have any specific expectations. This is to get more data to see how the stack works and figure out what to do if it doesn't before we've gone live.

Related work

Similar ticket for OD1: https://github.com/OregonDigital/oregondigital/issues/324

jechols commented 4 years ago

It's worth noting that today we ran the reindexing process on roughly 2600 assets. Timing:

straleyb commented 3 years ago

tesseract ticket

jechols commented 3 years ago

Same concern here as with the profiler issue (#936): this isn't related to tesseract. Have we done load testing beyond the reindex? If so, let's update this ticket and close it. If not, let's keep this open or else explicitly state we're not going to load test.

raybrarian commented 3 years ago

In c4l, and a session on locust.io (Python) makes me wonder if it might work for this: https://locust.io/. Wanted to park it here in case it's helpful.

jechols commented 3 years ago

I probably wouldn’t waste time on finding a tool with a “pretty” UI. There are a ton of tools for load testing, but simpler ones like siege can be as effective as anything with a UI without getting in the way of just blasting a ton of users at the front-end. But the thing is, all these tools are just front-end tests for HTTP requests.

With OD, load testing can mean a whole lot more than just HTTP, though. User traffic is definitely one aspect, but there’s also: ingesting too many things at once, indexing the full database, overloading workers on purpose to see what happens, etc. These are important. Possibly more important than just front-end traffic. Then there are “everything” tests – what happens if we have a bulk ingest of AVI files (very heavy on workers) while the site has a lot of users? Unfortunately, generic HTTP load tests can’t automate this kind of thing, because we have to identify what’s expensive on the back-end and script something that will force that to occur.

My bigger concern is that we haven’t really even thought through possible scenarios, much less how we could test them.

From: Ray Henry @.> Sent: Wednesday, March 24, 2021 12:06 To: OregonDigital/OD2 @.> Cc: Jeremy Echols @.>; State change @.> Subject: Re: [OregonDigital/OD2] Load test the stack (#930)

In c4l, and a session on locust.io (Python) makes me wonder if it might work for this: https://locust.io/. Wanted to park it here in case it's helpful.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/OregonDigital/OD2/issues/930#issuecomment-806081479, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQO3FSV2SQVAEBI6PABLKLTFIZ2FANCNFSM4KDJ6N6Q.

raybrarian commented 3 years ago

@jechols - definitely not looking for a pretty UI! Small, Python, flexible user test scenarios. Not just "non-dev manager at a conference wants to do a thing manager saw at the conference." :) And yeah, I agree re: just http. Definitely need the other load/stress testing as well, and if that's a list you think the devs can come up with, we should probably consider doing that this workcycle (WC 16).

decimalator commented 2 years ago

Unless there are specific things left we want to stress test, I think we've pretty well stress tested the backend! Do we need to do frontend load testing?

Fedora, Postgres, Redis, Blazegraph, Sidekiq and SOLR have been through the equivalent of the battle of thermopylae with the migration from OD1 and they're handling it quite well. And the cluster is still able to run OSU's other production workloads just fine.

I don't think we've done "load testing" per se of the Rails frontend. We've fixed a few scaling issues:

We're currently running 2 replicas for web and web-admin, and we can add more on-demand. Anyone with cluster access can scale the deployments up as needed. I can also put together an autoscaling automation to scale the deployments up and down as needed.