Open bhauer opened 6 years ago
Ping @steveklabnik.
This seems great! I agree with your assessment personally, I'm less convinced by the disk stuff than CPU/memory/network. Those are the big three in my mind.
What's the rationale for prioritizing CPU usage over memory usage? The existing throughput and latency results are already telling us a lot about the time dimension, while memory usage is a whole new dimension. It's the one I'm most interested in seeing reported, of the ones you list, and the one that IMO is most deserving as a default sort column.
I've been out of touch with this effort for a while, but I remember some work counting lines of code at some point. If an effort is going to be made to visualize a bunch of new data, could code compactness be considered as well? Or is that different enough to deserve a separate GitHub issue?
What would be super cool is graphing of frameworks in a multidimensional space, say illustrating tradeoffs between time (e.g. throughput), space (e.g. 95% memory usage), and developer effort (e.g. lines of code).
@achlipala Agreed that lines of code and other dimensions are desirable. But yes, that's probably a conversation for another issue.
I can see your point about memory being very interesting. Perhaps a decent first step would be to plot some of these numbers from a sampling of frameworks to see how the data varies. My interest in the CPU data is that I suspect some frameworks are not fully using the CPU due to resource locking or similar. But seeing some actual data could disprove that.
I would like to +1 this as well. Starting small, simply seeing requests-per-second side-by-side with memory / latency / etc. would make this data that much better.
It's a shame to collect so much data with dstat and not visualize it, so to illustrate I put together a quick chart showing rps / memory side-by-side just to see what it would look like using the data from https://tfb-status.techempower.com/results (the continuous benchmarking data is great, by the way).
PS this issue doesn't look to be on the roadmap https://github.com/TechEmpower/FrameworkBenchmarks/projects/2
This is a great visualization, and I agree that it would be great to get these illustrated in some way.
There are a few things that I have prioritized ahead of this kind of thing, but it will ultimately make this sort of formal implementation easier and more robust (I hope 😅).
I am out of the office until the 7th, but will hopefully have more news shortly thereafter.
Sent from my iPhone
On Jan 2, 2019, at 10:41 AM, Aaron B notifications@github.com wrote:
I would like to +1 this as well. Starting small, simply seeing requests-per-second side-by-side with memory / latency / etc. would make this data that much better.
It's a shame to collect so much data with dstat and not visualize it, so to illustrate I put together a quick chart showing rps / memory side-by-side just to see what it would look like using the data from https://tfb-status.techempower.com/results (the continuous benchmarking data is great, by the way).
PS this issue doesn't look to be on the roadmap https://github.com/TechEmpower/FrameworkBenchmarks/projects/2
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
If anyone's curious, I revisited this and threw some data into an ag-grid you can view online. It highlights RPS, 90th percentile latency, average memory ("memory usage; used" from dstat) and CPU usage ("total cpu usage; usr" from dstat).
Data from the ZIP download of the 2019-10-28 results. One Python script for parsing & JavaScript file for vis in the repo. Cheers!
Wow, that is super cool @ajdust! Your aggregation and summary of the dstat
raw data into a form digestable by a web UI might be cool to work into the standard toolset. I envision the output could then be made available for consumption alongside the results.json
file.
⚠️ The memory usage reported by dool (dstat) is not really useful for comparing frameworks. For example, Go frameworks should use less than 10MB on average, but the reported memory usage is around 2GB.
More details here https://github.com/TechEmpower/FrameworkBenchmarks/pull/9238
We use
dstat
to capture application server resource utilization statistics while we are measuring throughput and latency via the load generator. The raw data fromdstat
is stored at thetfb-logs
server. Below is an example of the statistics captured while Grizzly'sjson
implementation was being tested.http://tfb-logs.techempower.com/round-16/final/grizzly/json/stats.txt
I'd like opinions about what from this (rather large) set of input data to select and process into a subset suitable for rendering in a tabular and/or chart view in the results web site.
Here are some thoughts on goals:
A single table or chart may be ideal for consumption, but depending on what we discuss here, we may want to render more than will comfortably fit on a single chart. So we might end up going with multiple charts: Memory stats, IO stats, CPU stats, etc.
All of the numbers from
dstat
are captured periodically during a run. My thinking is that in all resource categories, we should capture a baseline and then only measure the deltas from that baseline. E.g., average, minimum, maximum, and 95th percentile memory usage versus baseline over the span of the run. The baseline should be re-computed prior to starting each benchmark measurement just in case it drifts or varies during an execution of the full suite.We need to confirm whether all rows of the dstat output are captured while a framework's server processes are running. Any rows in dstat that are captured before or after the service's processes are running (and being measured, I think) should be removed from our processing.
It would be nice, but not required, to have one useful statistic that could be used for sorting and optionally rendering as a bar chart. So, for example, average CPU usage or 95th percentile memory usage, or whatever. If we don't choose something that can be used for sorting, we'll render the chart(s) in framework name-alphabetical order. Alternatively, it could be a table view with sortable columns.
My personal prioritization of the available statistics is:
The rest is of less interest to me. What do others think?
Given our test types include no use of the application server's disks and our guidelines suggest that disk logging be disabled, a well-functioning test implementation would be expected to report nearly 0 for disk write and read. So I only include them here so that I can see the frameworks that are (unnecessarily) working the disks. Does that affect our thinking on the value of those stats?
Looking at
dstat
, it seems generating numbers for CPU utilization would involve manual aggregation of all the CPU cores. Am I missing something there?