Closed scottbell closed 11 months ago
~@akhenry @unlikelyzero It looks like the YAMCS Quickstart only fires at 1Hz. Do you guys have another lead on 10Hz data?~
@unlikelyzero had a smart idea of just running the simulator.py
10 times, which gives us 10Hz. It works!
@unlikelyzero says to run for an hour before shutting down.
~@akhenry @unlikelyzero It looks like the YAMCS Quickstart only fires at 1Hz. Do you guys have another lead on 10Hz data?~
@unlikelyzero had a smart idea of just running the
simulator.py
10 times, which gives us 10Hz. It works!
I've for sure had luck simulating 10Hz data by modifying this line to be sleep(0.1)
~@akhenry @unlikelyzero It looks like the YAMCS Quickstart only fires at 1Hz. Do you guys have another lead on 10Hz data?~ @unlikelyzero had a smart idea of just running the
simulator.py
10 times, which gives us 10Hz. It works!I've for sure had luck simulating 10Hz data by modifying this line to be
sleep(0.1)
Ah, duh. I've done this before too and had forgotten there's a separate sleep
above. Thanks for the info!
Adding a 2s "digestion delay" for the websocket callback absolutely kills YAMCS:
and the memory stays pretty high too, even post client disconnect, though restarting YAMCS resolves it.
If one comments out this:
webSocket:
writeBufferWaterMark: { low: 32768, high: 160000000 }
it causes a great deal of these messages:
16:05:21.647 _global [45] WebSocketServerMessageHandler Channel full, cannot write message with priority=NORMAL (slow network?). Closing connection.
but the CPU/memory consumption of YAMCS remains constant. So perhaps writeBufferWaterMark
is tuned too high for the YAMCS server?
The defaults are:
{ low: 32768, high: 131072 }
Water marks for the write buffer of each WebSocket connection. When the buffer is full, messages are dropped. High values lead to increased memory use, but connections will be more resilient against unstable networks (i.e. high jitter). Increasing the values also help if a large number of messages are generated in bursts. The map requires keys low and high indicating the low/high water mark in bytes.
@scottbell Beautiful! These are great findings. The error message is a symptom of a self defense mechanism against slow clients which we are effectively disabling by using arbitrarily large write buffers. Open MCT can handle being dropped, it will just reconnect. Also, if we get dropped for being too slow that's useful feedback that allows us to direct our optimization efforts. There's stuff we can do in Open MCT to make it process WebSocket messages quicker so the buffer doesn't back up, such as taking WebSocket handling off the UI thread.
@scottbell @unlikelyzero Can we use K6 to load real Open MCT clients?
I think the next step is to reproduce this with real Open MCT clients so that we have a test bed for measuring Open MCT changes.
We will need a sufficiently complex Open MCT display with a bunch of plots, LAD Tables, alphanumerics, and condition sets / widgets. @charlesh88 has some scripting to automate building these I believe.
I think it's worth trying to build a real repro of this in Quickstart for a couple reasons:
@unlikelyzero @akhenry
There's a K6 browser I think we could use to do this. From what I can tell, we'd need to:
Playwright looks like it also has something similar we could do with Artillery, but I'm not familiar with it.
@unlikelyzero @akhenry Fiddling with rather modest parameters for K6:
const maxClients = 40;
const workersPerClient = 5;
const digestionTimeInMs = 500;
on their own create a rather slow build in memory consumption. But if I stop the K6 process after 10 minutes, and restart it, YAMCS never really gives up the fat websocket buffers from the previous run (or at least not quickly enough) and quickly run out of memory.
@akhenry @unlikelyzero I've added a browser test too. I'll let you know what I find testing it out on Open MCT Quickstart.
Summary
Using K6, or by writing a simple Node script, simulate 300 WebSocket clients subscribing to 10Hz telemetry. What is the impact on YAMCS? What happens if you make the clients slow to service the WebSocket messages, simulating a browser under heavy load? Does the CPU utilization scale up linearly, or is there a threshold at which it suddenly jumps up?