Open def- opened 5 days ago
One thing that came to mind is that @antiguru mentioned that the query history currently doesn't support running many queries against Materialize (because it eventually OoMs), maybe that's related? I'm not even able to load the query history anymore now
The mz_catalog_server usage is going up and up, so I guess that is related and fits with the query history being responsible: For OLTP-type workloads this is probably a blocker (and also for me enabling this test).
Could we run this against a staging environment that has statement logging disabled? This way we could unblock the test, which is important to have on its own.
Could we run this against a staging environment that has statement logging disabled? This way we could unblock the test, which is important to have on its own.
Yes, that will be my next step, I was hoping not to have to this early because of cost and for convenience of having to recreate sources etc.
I was hoping not to have to this early because of cost and for convenience of having to recreate sources etc.
What environment is the test running against now? We could disable statement logging against that environment instead.
Oh, just saw the conversation on Slack. Never mind.
What version of Materialize are you using?
v0.116.0
What is the issue?
Seen in https://buildkite.com/materialize/qa-canary/builds/228#0191eb33-4b0c-406a-af5e-1e5bf13c4413 on my PR introducing that test: https://github.com/MaterializeInc/materialize/pull/29524 This is running a few simple
SELECT
queries against a cluster. Only theSELECT 1
is open loop with 100 queries per second (not affected), while the rest are closed loop (and strict serializable) and are getting slower slowly with time:The workload is running against the Materialize Production Sandbox (maybe I should move it to a dedicated staging env to be more isolated from other noise?), and since the first attempt was only 10 minutes I'm now retrying with 1 hour: https://buildkite.com/materialize/qa-canary/builds/229 The cluster itself (200cc, https://console.materialize.com/regions/aws-us-east-1/clusters/u3/qa_canary_environment_compute?timePeriod=180) always stayed at <=50% CPU usage. Since it's not overloaded, I expected the queries' performance to stay consistent over time.