Closed drujensen closed 6 years ago
Thanks for this @drujensen :+1: I've also done some small runs and can second this issue, unfortunately I don't have any logs :cry:
Hi @drujensen! We are investigating some issues now with r15-p2 and are doing some reruns. I'll let you know as soon as I see some results and we will go from there.
@drujensen @sdogruyol Looks like we found the issue. We've started a new run and amber has completed but it will be some time before we get a full picture for R15-P3. Here are the results:
"plaintext": {
"amber": [
{
"latencyAvg": "23.70ms",
"latencyMax": "114.14ms",
"latencyStdev": "12.06ms",
"totalRequests": 1448302,
"startTime": 1504802515,
"endTime": 1504802530
},
{
"latencyAvg": "87.41ms",
"latencyMax": "404.84ms",
"latencyStdev": "50.07ms",
"totalRequests": 1629658,
"startTime": 1504802532,
"endTime": 1504802547
},
{
"latencyAvg": "336.33ms",
"latencyMax": "1.01s",
"latencyStdev": "180.85ms",
"totalRequests": 1621857,
"startTime": 1504802549,
"endTime": 1504802564
},
{
"latencyAvg": "1.30s",
"latencyMax": "4.72s",
"latencyStdev": "694.03ms",
"totalRequests": 1596471,
"startTime": 1504802566,
"endTime": 1504802581
}
],
"db": {
"amber": [
{
"latencyAvg": "582.47us",
"latencyMax": "5.86ms",
"latencyStdev": "589.15us",
"totalRequests": 252698,
"startTime": 1504801882,
"endTime": 1504801897
},
{
"latencyAvg": "603.34us",
"latencyMax": "7.06ms",
"latencyStdev": "664.24us",
"totalRequests": 506598,
"startTime": 1504801899,
"endTime": 1504801914
},
{
"latencyAvg": "748.65us",
"latencyMax": "23.61ms",
"latencyStdev": "1.12ms",
"totalRequests": 912180,
"startTime": 1504801916,
"endTime": 1504801931
},
{
"latencyAvg": "3.60ms",
"latencyMax": "87.26ms",
"latencyStdev": "7.44ms",
"totalRequests": 950311,
"startTime": 1504801933,
"endTime": 1504801949
},
{
"latencyAvg": "22.57ms",
"latencyMax": "198.01ms",
"latencyStdev": "38.84ms",
"totalRequests": 965665,
"startTime": 1504801951,
"endTime": 1504801966
},
{
"latencyAvg": "93.45ms",
"latencyMax": "656.78ms",
"latencyStdev": "141.04ms",
"totalRequests": 998851,
"startTime": 1504801968,
"endTime": 1504801983
}
],
"update": {
"amber": [
{
"latencyAvg": "18.10ms",
"latencyMax": "278.91ms",
"latencyStdev": "19.11ms",
"totalRequests": 269955,
"startTime": 1504802265,
"endTime": 1504802280
},
{
"latencyAvg": "65.79ms",
"latencyMax": "323.45ms",
"latencyStdev": "37.01ms",
"totalRequests": 59538,
"startTime": 1504802282,
"endTime": 1504802297
},
{
"latencyAvg": "125.15ms",
"latencyMax": "435.86ms",
"latencyStdev": "51.19ms",
"totalRequests": 30681,
"startTime": 1504802299,
"endTime": 1504802314
},
{
"latencyAvg": "184.44ms",
"latencyMax": "507.47ms",
"latencyStdev": "61.51ms",
"totalRequests": 20706,
"startTime": 1504802316,
"endTime": 1504802331
},
{
"latencyAvg": "245.74ms",
"latencyMax": "607.97ms",
"latencyStdev": "71.19ms",
"totalRequests": 15493,
"startTime": 1504802333,
"endTime": 1504802348
}
],
"json": {
"amber": [
{
"latencyAvg": "129.93us",
"latencyMax": "2.97ms",
"latencyStdev": "74.34us",
"totalRequests": 984697,
"startTime": 1504802016,
"endTime": 1504802031
},
{
"latencyAvg": "133.28us",
"latencyMax": "2.86ms",
"latencyStdev": "77.50us",
"totalRequests": 1916676,
"startTime": 1504802033,
"endTime": 1504802048
},
{
"latencyAvg": "186.28us",
"latencyMax": "8.57ms",
"latencyStdev": "118.22us",
"totalRequests": 2707365,
"startTime": 1504802050,
"endTime": 1504802065
},
{
"latencyAvg": "615.00us",
"latencyMax": "9.95ms",
"latencyStdev": "260.15us",
"totalRequests": 1582804,
"startTime": 1504802067,
"endTime": 1504802082
},
{
"latencyAvg": "1.28ms",
"latencyMax": "16.54ms",
"latencyStdev": "849.91us",
"totalRequests": 1610007,
"startTime": 1504802084,
"endTime": 1504802099
},
{
"latencyAvg": "2.33ms",
"latencyMax": "35.07ms",
"latencyStdev": "1.06ms",
"totalRequests": 1668241,
"startTime": 1504802101,
"endTime": 1504802116
}
],
"query": {
"amber": [
{
"latencyAvg": "95.62ms",
"latencyMax": "575.93ms",
"latencyStdev": "144.30ms",
"totalRequests": 1000349,
"startTime": 1504802149,
"endTime": 1504802164
},
{
"latencyAvg": "110.83ms",
"latencyMax": "620.93ms",
"latencyStdev": "160.90ms",
"totalRequests": 200564,
"startTime": 1504802166,
"endTime": 1504802181
},
{
"latencyAvg": "121.41ms",
"latencyMax": "691.91ms",
"latencyStdev": "164.79ms",
"totalRequests": 99223,
"startTime": 1504802183,
"endTime": 1504802198
},
{
"latencyAvg": "131.78ms",
"latencyMax": "674.72ms",
"latencyStdev": "169.81ms",
"totalRequests": 64963,
"startTime": 1504802200,
"endTime": 1504802215
},
{
"latencyAvg": "146.27ms",
"latencyMax": "781.06ms",
"latencyStdev": "174.94ms",
"totalRequests": 48586,
"startTime": 1504802217,
"endTime": 1504802233
}
],
Sorry to put the scare in everyone, but this is exactly why we do preview runs!
Hi Nate,
Thanks for looking into this. Unfortunately, this still looks the same to me. This is showing ~100k per second but i’m expecting 1.8m per second given my own benchmarking. We are getting ~400k per second on an 8 core system.
I’m not sure where to go from here. Any insight would be helpful.
Thanks, Dru
On Sep 7, 2017, at 11:10 AM, Nate notifications@github.com wrote:
@drujensen https://github.com/drujensen @sdogruyol https://github.com/sdogruyol Looks like we found the issue. We've started a new run and amber has completed but it will be some time before we get a full picture for R15-P3. Here are the results:
"plaintext": { "amber": [ { "latencyAvg": "23.70ms", "latencyMax": "114.14ms", "latencyStdev": "12.06ms", "totalRequests": 1448302, "startTime": 1504802515, "endTime": 1504802530 }, { "latencyAvg": "87.41ms", "latencyMax": "404.84ms", "latencyStdev": "50.07ms", "totalRequests": 1629658, "startTime": 1504802532, "endTime": 1504802547 }, { "latencyAvg": "336.33ms", "latencyMax": "1.01s", "latencyStdev": "180.85ms", "totalRequests": 1621857, "startTime": 1504802549, "endTime": 1504802564 }, { "latencyAvg": "1.30s", "latencyMax": "4.72s", "latencyStdev": "694.03ms", "totalRequests": 1596471, "startTime": 1504802566, "endTime": 1504802581 } ], "db": { "amber": [ { "latencyAvg": "582.47us", "latencyMax": "5.86ms", "latencyStdev": "589.15us", "totalRequests": 252698, "startTime": 1504801882, "endTime": 1504801897 }, { "latencyAvg": "603.34us", "latencyMax": "7.06ms", "latencyStdev": "664.24us", "totalRequests": 506598, "startTime": 1504801899, "endTime": 1504801914 }, { "latencyAvg": "748.65us", "latencyMax": "23.61ms", "latencyStdev": "1.12ms", "totalRequests": 912180, "startTime": 1504801916, "endTime": 1504801931 }, { "latencyAvg": "3.60ms", "latencyMax": "87.26ms", "latencyStdev": "7.44ms", "totalRequests": 950311, "startTime": 1504801933, "endTime": 1504801949 }, { "latencyAvg": "22.57ms", "latencyMax": "198.01ms", "latencyStdev": "38.84ms", "totalRequests": 965665, "startTime": 1504801951, "endTime": 1504801966 }, { "latencyAvg": "93.45ms", "latencyMax": "656.78ms", "latencyStdev": "141.04ms", "totalRequests": 998851, "startTime": 1504801968, "endTime": 1504801983 } ], "update": { "amber": [ { "latencyAvg": "18.10ms", "latencyMax": "278.91ms", "latencyStdev": "19.11ms", "totalRequests": 269955, "startTime": 1504802265, "endTime": 1504802280 }, { "latencyAvg": "65.79ms", "latencyMax": "323.45ms", "latencyStdev": "37.01ms", "totalRequests": 59538, "startTime": 1504802282, "endTime": 1504802297 }, { "latencyAvg": "125.15ms", "latencyMax": "435.86ms", "latencyStdev": "51.19ms", "totalRequests": 30681, "startTime": 1504802299, "endTime": 1504802314 }, { "latencyAvg": "184.44ms", "latencyMax": "507.47ms", "latencyStdev": "61.51ms", "totalRequests": 20706, "startTime": 1504802316, "endTime": 1504802331 }, { "latencyAvg": "245.74ms", "latencyMax": "607.97ms", "latencyStdev": "71.19ms", "totalRequests": 15493, "startTime": 1504802333, "endTime": 1504802348 } ], "json": { "amber": [ { "latencyAvg": "129.93us", "latencyMax": "2.97ms", "latencyStdev": "74.34us", "totalRequests": 984697, "startTime": 1504802016, "endTime": 1504802031 }, { "latencyAvg": "133.28us", "latencyMax": "2.86ms", "latencyStdev": "77.50us", "totalRequests": 1916676, "startTime": 1504802033, "endTime": 1504802048 }, { "latencyAvg": "186.28us", "latencyMax": "8.57ms", "latencyStdev": "118.22us", "totalRequests": 2707365, "startTime": 1504802050, "endTime": 1504802065 }, { "latencyAvg": "615.00us", "latencyMax": "9.95ms", "latencyStdev": "260.15us", "totalRequests": 1582804, "startTime": 1504802067, "endTime": 1504802082 }, { "latencyAvg": "1.28ms", "latencyMax": "16.54ms", "latencyStdev": "849.91us", "totalRequests": 1610007, "startTime": 1504802084, "endTime": 1504802099 }, { "latencyAvg": "2.33ms", "latencyMax": "35.07ms", "latencyStdev": "1.06ms", "totalRequests": 1668241, "startTime": 1504802101, "endTime": 1504802116 } ], "query": { "amber": [ { "latencyAvg": "95.62ms", "latencyMax": "575.93ms", "latencyStdev": "144.30ms", "totalRequests": 1000349, "startTime": 1504802149, "endTime": 1504802164 }, { "latencyAvg": "110.83ms", "latencyMax": "620.93ms", "latencyStdev": "160.90ms", "totalRequests": 200564, "startTime": 1504802166, "endTime": 1504802181 }, { "latencyAvg": "121.41ms", "latencyMax": "691.91ms", "latencyStdev": "164.79ms", "totalRequests": 99223, "startTime": 1504802183, "endTime": 1504802198 }, { "latencyAvg": "131.78ms", "latencyMax": "674.72ms", "latencyStdev": "169.81ms", "totalRequests": 64963, "startTime": 1504802200, "endTime": 1504802215 }, { "latencyAvg": "146.27ms", "latencyMax": "781.06ms", "latencyStdev": "174.94ms", "totalRequests": 48586, "startTime": 1504802217, "endTime": 1504802233 } ], Sorry to put the scare in everyone, but this is exactly why we do preview runs!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TechEmpower/FrameworkBenchmarks/issues/2960#issuecomment-327878442, or mute the thread https://github.com/notifications/unsubscribe-auth/AABJHp31wIhG_7VGcoqyIrZo5nMU9AN1ks5sgDBNgaJpZM4PN5lo.
Nate,
To reiterate what @drujensen said above: We're getting 400k r/s after installing tech empower benchmarks and setting that up as per the instructions. It seems strange to that our 8 core VPS gets faster benchmarks than your 80 core server. Especially since on our tests we're only about 20% slower than crystal-raw which your benchmarks show at 2.5 million.
Thanks for your feedback,
Isaac Sloan
Sorry guys, I misunderstood. Could you paste me the benchmark.cfg
you are using in the framework root. Are you guys using a 3 machine set up? Our App server, Client, and DB are on separate machines. And have you also tested crystal in your environment?
Hey guys @drujensen @elorest Can you read this comment again?
He is showing us good results for plaintext, at least much better than before, closer to crystal-raw
"plaintext": {
"amber": [
{
"latencyAvg": "23.70ms",
"latencyMax": "114.14ms",
"latencyStdev": "12.06ms",
"totalRequests": 1448302, # => 1.44M
"startTime": 1504802515,
"endTime": 1504802530
},
...
{
"latencyAvg": "1.30s",
"latencyMax": "4.72s",
"latencyStdev": "694.03ms",
"totalRequests": 1596471, # => 1.59M
"startTime": 1504802566,
"endTime": 1504802581
}
],
Also json is good:
"json": {
"amber": [
{
"latencyAvg": "129.93us",
"latencyMax": "2.97ms",
"latencyStdev": "74.34us",
"totalRequests": 984697, # => 984K
"startTime": 1504802016,
"endTime": 1504802031
},
...
{
"latencyAvg": "2.33ms",
"latencyMax": "35.07ms",
"latencyStdev": "1.06ms",
"totalRequests": 1668241, # => 1.66M
"startTime": 1504802101,
"endTime": 1504802116
}
],
...
Thanks you @nbrady-techempower for you support! :heart:
@faustinoaq that's total requests
1.44M / 15 ~= 100K RPS
@sdogruyol My bad :sweat_smile:
@elorest @drujensen can you confirm whether you're on a 1 or 3 machine setup and specifically, what is your ULIMIT max? We've set a very high ULIMIT on these machines and that specifically helped the performance of a lot of frameworks. After looking closely at wrk results you zipped up and the ones being generated from our environment, it seems like you may have a single machine setup with reduced latency and a smaller ULIMIT size which stunted crystal-raw.
Nate,
We only used a single machine and used the default u limit.
We will rerun on two boxes since we are not concerned with the 4 db tests and up the u limit to max.
I doubt that crystal-raw was some way limited by ulimits but we will confirm.
Question is why running on one box would be faster than 2, and why we are seeing 4x faster on an 8 core vs 80 core, but we will confirm.
We ran on a single VPS with other machines forwarded to 127.0.0.1 in the hosts file. Ulimit said unlimited.
On Sep 7, 2017 6:32 PM, "Dru Jensen" notifications@github.com wrote:
Nate,
We only used a single machine and used the default u limit.
We will rerun on two boxes since we are not concerned with the 4 db tests and up the u limit to max.
I doubt that crystal-raw was some way limited by ulimits but we will confirm.
Question is why running on one box would be faster than 2, and why we are seeing 4x faster on an 8 core vs 80 core, but we will confirm.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TechEmpower/FrameworkBenchmarks/issues/2960#issuecomment-327966729, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHmpIEHJAZDs0C0zGJ_0Zrjhh_cgUrIks5sgIsSgaJpZM4PN5lo .
@drujensen From the client running wrk
to a server on a different machine simulates a real request over the internet. Hitting localhost with wrk
on the same machine is going to give you much better results?
I don't think there's such a thing as ulimit=unlimited.
Also, as another note, the client running work is 8 HT cores. The server hosting your application has 80 HT cores.
From the client running wrk to a server on a different machine simulates a real request over the internet.
@nbrady-techempower I think :point_up_2: is reasonable but the thing here is that results in plaintext
and json
with kemal
&& amber
. seems to be too far of crystal-raw
results.
So, if crystal-raw
is getting 1.5M RPS we are expecting at least 50% of that (say 700K RPS) not a 10% as the preview 2 is showing us (100K RPS). I think we can find the problem :smile:
BTW, Thanks you a lot for maintaining this awesome project! :heart:
@faustinoaq I understand what you're saying. I can give you guys other environment information as you need it. There must be something that crystal-raw
is taking advantage of in that environment? I'm not sure without digging deep in to each framework, which unfortunately I can't do at the moment. But let me know if there's anything else I can get you.
@nbrady-techempower One theory we have is that we have too many GC threads running. Crystal by default launches 16 threads per process.
Crystal has a configuration setting to reduce the number of GC threads, but unfortunately, we don't have a way of checking to see if this is the issue since we can't duplicate the issue.
Is there a way we can make the changes and then request a run of amber and kemal to determine if these changes fix the slowness before another round?
@drujensen Absolutely! We actually have some downtime at the moment because Server Central is moving our environment to a different location, so feel free to open a PR with those changes as there will be a few more preview runs before official results are released.
@nbrady-techempower Thanks for your quick response! Can you run this PR against the servers and provide results?
I notice that for Kemal and Amber, it looks like you are converting the result to a JSON string and returning that, which would have to be later written to the IO for the response. The standard Crystal version writes directly to the output IO instead of constructing an intermediate string. Could that explain the discrepancy?
@foliot You might be on to something. I could especially see the queries one being affected by that.
amber
def queries
response.content_type = JSON
queries = params["queries"]
queries = queries.to_i? || 1
queries = queries.clamp(1..500)
results = (1..queries).map do
if world = World.find rand(1..ID_MAXIMUM)
{id: world.id, randomNumber: world.randomNumber}
end
end
results.to_json
end
vs crystal
when "/queries"
response.status_code = 200
response.headers["Content-Type"] = "application/json"
JSON.build(response) do |json|
json.array do
sanitized_query_count(request).times do
random_world.to_json(json)
end
end
end
All though I'm pretty sure that the hello world routes are also being slow and breaking pipes.
@drujensen what do you think?
@elorest See https://github.com/TechEmpower/FrameworkBenchmarks/pull/2891
@sdogruyol tried it on Kemal before
Broken pipe is a misleading error, they happen when wrk
shuts down and closes the tcp connection while crystal is still writing a response. It's not going to affect performance.
@faustinoaq Ah yea, I thought I remembered @sdogruyol trying that. It looks like the problem was that he wrote the JSON to IO in the handler, while also returning a value that the framework would write to IO. There was some sort of problem when he wrote to the IO twice.
Couldn't you return a wrapper object with a to_str(io)
method that would do nothing but call results.to_json(io)
? Then there should be no intermediate string created. Unless, of course, the framework also creates an intermediate string from the return value instead of writing it straight to IO...
You could standardize this by making various response classes such as JSONResponse
and PlainTextResponse
so you would just have to return JSONResponse.new results
. You could also have a content_type
method on the response object so you wouldn't have to set that manually in the handler.
Now that I look at it again, it doesn't look like it's the JSON benchmarks where Kemal is struggling.. Maybe the IO stuff isn't much of a performance impact after all.
EDIT: Never mind.. that is one of the ones it's struggling on. I'm misreading things.
There's a ticket where we're working on that. We do have an issue though we can't write to response if we care about Handler fall through. Once you write the response it can't be modified. If this were an issue we could probably find a way around it for benchmarks.
On Oct 28, 2017 1:06 PM, "foliot" notifications@github.com wrote:
@faustinoaq https://github.com/faustinoaq Ah yea, I thought I remembered @sdogruyol https://github.com/sdogruyol trying that. It looks like the problem was that he wrote the JSON to IO in the handler, while also returning a value that the framework would write to IO. There was some sort of problem when he wrote to the IO twice.
Couldn't you return a wrapper object with a to_str(io) method that would do nothing but call results.to_json(io)? Then there should be no intermediate string created. Unless, of course, the framework also creates an intermediate string from the return value instead of writing it straight to IO...
You could standardize this by making various response classes such as JSONResponse and PlainTextResponse so you would just have to return JSONResponse.new results. You could also have a content_type method on the response object so you wouldn't have to set that manually in the handler.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TechEmpower/FrameworkBenchmarks/issues/2960#issuecomment-340212991, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHmpDaXUlGUwxzCX6e16tARvrhIXtMlks5sw3tHgaJpZM4PN5lo .
Unless, of course, the framework also creates an intermediate string from the return value instead of writing it straight to IO...
@foliot You can avoid return value using Nil
as return type:
def foo : Nil
"foo"
end
pp foo # => nil
Update: I misread the comment 😅
Also exist NoReturn
type, but is used only for low level stuff
Another interesting reference to @RX14 comments about GC via IRC:
<RX14> if you have 1 process with 80 threads you have 1 GC which pauses every thread
<RX14> does it's work
<RX14> and then finishes
<RX14> if you have 80 processes you have 80 seperate GCs
<c-c> ok, I'm not entirely lost
<RX14> which means much higher overhead
<RX14> especially when the off-the-shelf GC we uses uses concurrent garbage collection
<RX14> so suddenly your singlethreaded process is trying to use 16 threads to GC 1 thread
<RX14> which is... suboptimal
<RX14> it muddies the scheduler
Hey guys, wanted to let you know we have a run going. However, amber did fail. I haven't been able to dig in to the logs so here's the output if you want to take a look: http://sprunge.us/PXUM
Also here's a quick snapshot of Kemalyst vs last round:
I'll drop some of the raw json data here a little later today
@nbrady-techempower Thanks for the results. This didn't seem to help.
@faustinoaq @RX14 I think at this point we need to consider reducing the number of processes we launch. Currently we are launching a process per core. WDYT about launching 1/2 to 3/4 that so 1 process for every 2 cores?
@nbrady-techempower The results of crystal-raw compared to previous round would be great.
We know that crystal itself performs well, perhaps you should focus on reducing garbage allocation throughput inside your frameworks first.
Here's a link to the results.json
for the run still in progress. http://sprunge.us/HNHc
@nbrady-techempower please can you place a link to the place on the website where you can enter results JSON and visualize it, I seem to have lost my link.
@nbrady-techempower it'd be great if there was a link to that somewhere on the site :)
@RX14 Fair point
@RX14 I added it to the readme here which is the best i can do at the moment. I'll make a note of that for the site.
Looks like crystal-raw got a pretty good boost there, but all the frameworks still perform terribly. That's very weird, perhaps some more in-person debugging is required. A basic check would be to run only the kemal
test, and check htop to see if all the CPU cores are utilised. Apart from that, I can't think what could be the problem that it performs 20x slower...
@RX14 It's interesting because if I run the crystal, kemal, amber tests locally they all come out about the same with raw crystal being about 5% faster. Seems that benchmarking from another computer over the network is what changes that somehow. Not sure why crystal raw would handle it fine but amber and kemal would blow up though.
Two points:
@elorest @drujensen @sdogruyol What is holding us back from creating the same environment so we can get a feedback loop on fixing things? Is it because it is custom hardware rather than an AWS instance?
@nbrady-techempower - Are there tests planned for AWS (or other providers) - that would be easier to reproduce on our own?
TL;DR If this wasn't already done, using prepared statements for the database queries should really help performance on those tests.
@nbrady-techempower - Are we allowed to use any indexing/tuning strategies that we want on the database?
@RX14 @elorest @drujensen @faustinoaq @sdogruyol crystal-raw is way faster until a database is involved. It is in the 80%+ range, then drops into the 40% range for reads, then 13.5% for writes. I think this needs to be addressed.
A few questions about this:
crystal-pg
or crystal-db
? Does anyone have time to setup a mysql test?
Single Query:
crystal | 77,224 | 100.0%(40.8%)
kemal (postgresql) | 75,806 | 98.2%(40.0%)
Multiple Queries: kemal (postgresql) | 3,781 | 100.0%(41.2%) crystal | 3,745 | 99.0%(40.8%)
Fortunes: crystal | 88,041 | 100.0%(49.4%) kemal (postgresql) | 82,572 | 93.8%(46.4%)
Data Updates crystal | 1,155 | 100.0%(13.5%) kemal (postgresql) | 1,141 | 98.8%(13.3%)
With no db involved, crystal-raw is 2x-20x faster as RX-14 mentioned:
Plaintext: crystal | 2,134,425 | 100.0%(83.2%) kemal (postgresql) | 111,917 | 5.2%(4.4%)
JSON Serialization crystal | 528,217 | 100.0%(85.5%) kemal (postgresql) | 192,474 | 36.4%(31.1%)
Are we allowed to use any indexing/tuning strategies that we want on the database?
No
Has the database been tuned in any way? Prepared statements, indexing, etc.?
Only primary keys on the index column for each table. Here is the setup for the MySQL database and tables, for reference.
Are there tests planned for AWS (or other providers) - that would be easier to reproduce on our own?
Microsoft Azure, though I am not sure of the specs off hand
@msmith-techempower - It looks like we can use prepared statements though:
Use of prepared statements for SQL database tests (e.g., for MySQL) is encouraged but not required.
Source: https://www.techempower.com/benchmarks/#section=code
Is that the only thing allowed? Is it allowed for postgres? What about postgresql functions?
@marksiemers As far as the specifications go, we can't predict everything someone will come up with to get an edge, but we've done our best to capture everything we don't allow. If you have questions about anything in particular, or if something seems unclear please let us know.
As far as Azure, like @msmith-techempower we don't have specs for those environments off-hand, and to be honest, we're not sure when exactly those will be live.
In the meantime I'm happy to get you any information about our SC environment that we haven't already captured.
@marksiemers There's 2 issues here, optimizing crystal-raw and reproducing the problems frameworks have keeping up with crystal-raw. Please keep these issues seperate. Optimizations for crystal-raw would be appreciated (I think the bottleneck is much more around the unoptimized DB driver and connection pool, along with having a connection pool for each of 80 processes), but it's off-topic and distracting for this thread.
It looks like we need to devise some tests for techempower to run on their hardware, since they're the only ones who can reproduce. I offered my suggestion - a quick test to see if all CPU cores were fully utilized.
Just ran the benchmarks locally for crystal-raw, kemal, and amber.
results.json here: http://sprunge.us/iQXe
All passed successfully. (based on commit d7ab76a27e7f830923f8729673ed6ec7fa09f662)
Plaintext:
crystal-raw: 419,614 (100%)
amber: 278,086 (66.2%)
kemal: 245,115 (58.4%)
JSON
crystal-raw: 268,346 (100%)
amber: 206,898 (77.1%)
kemal: 182,925 (68.1%)
db
crystal-raw: 18,625 (100%)
amber: 17,783 (95.4%)
kemal: 16,646 (89.3%)
This is inline with our expectations of at least 50% of crystal-raw's performance.
I looked through the code and the setup.sh between crystal-raw and amber. Nothing stands out to me.
It looks like with server central, these are 40 core machines for the app server. Not that it should have an effect on this crystal vs framework discrepancy, but if we are thinking about not touching all cores, 40 should be the starting point.
@marksiemers That is correct and is why our expectation is ~1m requests per second for Amber/Kemal on plaintext and json. We have seen ~1m requests per second using Amber on smaller boxes.
@RX14 is right in that both frameworks are allocating more memory than crystal-raw and we could do better at memory management in both frameworks. It may be that the cause is the GC blocking to perform cleanup more often but I would have expected the same poor performance on smaller boxes, which is not the case.
@drujensen I think the mix between boxes latency and GC overhead could be the clue of Crystal's framework slowdown.
Maybe fill all cores isn't a good Idea, Maybe the OS also requires some free resources to perform some other operations.
@faustinoaq I think its worth a try. Lets divide the number of processes by 2 and see if that makes an improvement. Do you want to attempt at making the change?
Let's binary search the right number of cores!
It is still weird that it doesn't reproduce, but definitely worth a shot.
Hi Nate,
We are baffled by the results for the 3 crystal projects and hoping you can help shed some light on the 100k RPS results. We were expecting 1.8 to 2.0 million RPS range for these projects.
We have done multiple runs on different sized systems in AWS trying to replicate the slow response times for plaintext and json in preview 2. @sdogruyol has also ran the benchmarks and can confirm similar findings.
I'm attaching the results on a c4.xlarge 8 core system and all three projects are showing 3x the results than the preview 2. We have also ran the tests on a 64 core 1.2ghz system with 10x the requests per second (~1 million RPS) than preview 2.
We followed the instructions provided for setting up the linux environments Ubuntu 14.04 and cannot determine what the difference is that would cause the slowness.
We think that the preview 2 is not correctly reflecting the performance for these 3 projects. It's as if the multiple processes or the reuse port was not properly working. What we find interesting is the
crystal-raw
did not encounter the same issue and had 2.5m RPS which is in line with our expectations.We are aware of socket errors when there is heavy load, be we are not seeing this slow down the performance reflected in preview 2.
Any help is solving this mystery is appreciated. If possible, can you rerun the plaintext for the 3 projects and let us know if you are still seeing 100k RPS results?
Thanks, Dru
results.zip