TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project
https://www.techempower.com/benchmarks/
Other
7.56k stars 1.94k forks source link

Strange ASP.NET performance results? #362

Closed MalcolmEvershed closed 9 years ago

MalcolmEvershed commented 11 years ago

I just saw the Round 6 results and the results seem way lower than I expected. @bhauer, was there anything weird about the run or environment? Was everything actually updated? Just checking. Here's what looked strange to me:

  1. The JSON results show 'aspnet' and 'aspnet-mvc'. What does 'aspnet' represent? It can't represent 'aspnet-stripped' because that pull request has not been accepted. So how can it be in the Round 6 results?
  2. Almost all the results are slower between Round 5 and Round 6. That seems pretty odd.

Anyway, what a disappointment: we did all that work profiling and making improvements that caused real performance gains in our test setup and now those gains don't show up in the real run. Weird.

Thanks.

bhauer commented 11 years ago

@MalcolmEvershed Agreed. Let me speak with @pfalls1 tomorrow and see what we can figure out and we'll follow up. Thanks for pointing out the "aspnet" problem. The characterization of "aspnet" is definitely incorrect in the R6 data and I'll need to fix it up tomorrow.

MalcolmEvershed commented 11 years ago

Thanks!

bhauer commented 11 years ago

@MalcolmEvershed We don't yet have any important findings or ideas to share, but I did want to just report:

MalcolmEvershed commented 11 years ago

I think aspnet-jsonnet is really more like ASP.NET MVC using the JSON.Net JSON serializer instead of the default ASP.NET MVC JSON serializer. And aspnet-svcstk is more like ASP.NET MVC using the JSON serializer that comes with ServiceStack instead of the default ASP.NET MVC JSON serializer.

This spreadsheet comparing Round 5 and Round 6 shows that almost all the Windows tests took a hit in Round 6. Note that aspnet-mvc didn't take as big a hit, so I guess the performance improvements are reflected in there.

FransBouma commented 11 years ago

It's also weird that ASP.NET has to go through a full framework with EF (in alpha state) and MVC and is compared against CPP code which simply writes the output of a query to a string which is sent back as result. Apples - oranges. The 'stripped down' version of the ASP.NET tests come with a readme which says "This is stripped down so much it's hard to believe anyone would use this in production", yet the top performers have code which is precisely that: stripped down to the bare bones.

Has anyone profiled the ASP.NET code on mono / linux to see why it's so dog slow? Is it the MySQL provider, mono itself, their ASP.NET implementation?

bhauer commented 11 years ago

@FransBouma Due to the breadth of test implementations, it has been fairly common for readers to conceive of and then criticize comparisons that can be considered unfair when evaluated in isolation. It's important to bear in mind that in this project no single framework is being compared to any other single framework in isolation, but rather a broad field of frameworks.

It's not one apple against one orange. Consider it a comparison of the entire produce section. There are a bunch of apples, a bunch of oranges. You can compare the apples alongside the oranges, or you can ignore the oranges and look only at the apples.

I think most of us would agree that it's at-best a secondary interest knowing how ASP.NET MVC stacks up against a C++ web development platform (I assume you're referring to cpoll-cppsp). Personally, I would never write web applications in C++, but I'm not one to rule it out for others. To me, it's much more interesting to know how ASP.NET MVC compares to other full-stack frameworks. If it's easier to interpret the data with platforms and micro-frameworks hidden, I recommend filtering them out from the view using the filters panel. For example here is the single query test on open-source full-stack frameworks:

http://www.techempower.com/benchmarks/#section=data-r6&hw=i7&test=db&c=1&f=zik0zi-zik0zj-35r

As to the particulars of ASP.NET MVC, @MalcolmEvershed here has spent a good deal of time working on improving matters. Unfortunately, in the most recent round, the results did not improve as much as we hoped. Part of the problem is that it has been difficult for contributors such as Malcolm to smoke-test their changes prior to us running a round of tests because the process has been frustrated by the immaturity of our benchmark scripts and we hope to improve those. In some frameworks' cases, much depends on specific configuration details (version of library X, setting of configuration value Y, etc.). In time, we'd like to avoid the current model of submit a pull request and keep your fingers crossed that the results are good.

MalcolmEvershed commented 11 years ago

I'm the one who wrote "This is stripped down so much it's hard to believe anyone would use this in production". Maybe I should have made it clearer by saying that "I wouldn't use this in production" since there are a lot of opinions of what is ok to run in production (for example, I probably would run most *nix app servers behind nginx which would add more overhead). I'm open to suggestions.

I think a meta-problem/area-for-improvement with the benchmarks is that people that read the results might not know why a particular framework is fast or slow. I think one may be more inclined to use a 'slow' framework if one understood why it was 'slow' or you might be more inclined to avoid a 'fast' framework if you understood why it was 'fast'. For example, the Windows ASP.NET results are highly dependent on the database drivers and some of those are surprisingly wasteful (suggesting that there are very few users and that you wouldn't actually want to use them in production).

I haven't tried anything with Mono.

As to the recent Windows ASP.NET results, I think that the overall IIS<->ASP.NET per-request pipeline has too much overhead to put it in the same ballpark as, say, Go. I look forward to seeing the next run results to quantify how much that overhead is (by comparing it to the ASP.NET Stripped results and the HttpListener results). But I think I've kind of given up on trying to make ASP.NET (a full framework, as you mention) as fast as the bare bones frameworks.

bhauer commented 11 years ago

@MalcolmEvershed Thanks for the thoughts.

As a wishful-thinking / long-term goal, independently I had a goal of enhancing the results site to include a details page on each framework to include elements such as:

Looping back to your latest comment here, on such a page it may be interesting to add notes from the test implementer, such as yourself. As I understand the tone of your comment, you would like to be able to communicate some key points that those who review a test's results should understand. This would be a place to discuss, for example, the point about the relative immaturity of the .NET MySQL drivers.

MalcolmEvershed commented 11 years ago

@bhauer, yeah, those notes would be useful. For sure, I'd love to see other test implementors explain the profiler results of their frameworks (like Python for instance).

FransBouma commented 11 years ago

@bhauer I'm sorry, but I have to disagree. It is apples vs. oranges. On the benchmarks site, switch on 'full' only, and you'll see that ASP.NET has 1.1% of the performance of Gemini, which happens to be the framework of the host of the benchmarks. Are you really going to say that ASP.NET is that slow, that it can barely manage 1.1% of the performance of a Java stack? I truly hope not.

I write O/R mappers/modelers for a living and have been dealing with benchmarks for over a decade now, so I have a little bit of knowledge what to look for. However, most people do not, they simply look at the graphs and see: "framework X is much faster than framework Y". And not faster as in 'it can do 2-10% more queries per second', but faster as in 'Y has 1.1% of the performance of X, so that's a no-brainer'. I only have to point to the regular threads about this benchmark on reddit or HN.

Benchmark results which are listed together in a graph or table, are compared to eachother, and you know that, as you grouped various tests together, but not all in 1 graph. However, this only makes sense if the part you want to benchmark is the one which differs in the frameworks benchmarked. Take the 1-query test, the one which is the default chosen to be viewed when I visit http://www.techempower.com/benchmarks/.

The pipeline of the request handling consists of the following parts:

request to DB: client -> webserver -> webframework -> ORM -> DB Client -> Low-level API -> DB

result to Client: DB -> Low level API -> DB Client -> ORM -> Webframework -> JSON serializer -> Webserver -> Client.

When I look at the tests for the various languages, only the client and the DB are equal (and sometimes even that's not the case!). All other blocks/parts are different, or omitted. This means that you're really comparing apples vs. oranges.

To proof that, let's do a little thought experiment: let's say for all frameworks all parts are performing equal, except the DB Client + Low-level API: on .NET it performs horribly, and the C++ part performs excellent (and the rest is in between). This alone could cause the variation in performance as shown today, and make framework X look like a much better choice than framework Y, but under the hood, they all performed the same, as by rule of this experiment the only part which fluctuates in performance is the DB client + Low-level API to the DB.

Unless it's proven how much time is spend in the element benchmarked, it's useless to enlist the frameworks in 1 table, however because they're placed together, they're compared together, even though one can filter out some of them.

If I look at the ASP.NET code, I can't really find an area where code is likely to be slow. Perhaps a few cycles can be gained here and there, but that's minor. It's also not the area where the slow performance is located. We don't have to look at ASP.NET code to see what I mean, just look at the cpoll-cppsp code on MySQL (144,711) vs. cpoll-cppsp on PostgreSql (79,562). That's a 30% drop in performance by simply switching to another DB client, low-level API and DB. 30%! I.o.w.: the time spend in the web framework is minor compared to the time spend in the DB Client, low-level API and DB, because otherwise a switch of DB client wouldn't make such a difference. (a 10% speed gain in a part which takes 30% of the overall execution time doesn't make any difference on the overall execution time).

I'm sure ASP.NET isn't the top performer, but I'm also pretty sure it's impossible to shine in this benchmark unless the DB client and low-level API code used by the framework are top-notch. 1 glance on the MySQL C# provider code and one knows this won't be a winner. Entity Framework isn't a top performer either (see my own benchmarks of various frameworks which only differ in ORM/data access code: http://pastebin.com/AdsKitr3) but it's also not that dog slow that some javascript code is more than 10 times faster.

IMHO, if you want to benchmark the web frameworks, you should use in-memory objects and return those as JSON. That way, you test the speed of handling calls from clients and how fast objects can serialized to JSON and at least you keep more parts of the complete chain equal to one another.

I've asked the ASP.NET Team to see whether they can help (if they care, that is ;)).

pdonald commented 11 years ago

IMHO, if you want to benchmark the web frameworks, you should use in-memory objects and return those as JSON. That way, you test the speed of handling calls from clients and how fast objects can serialized to JSON and at least you keep more parts of the complete chain equal to one another.

We already do. There is a JSON serialization test and a Plaintext test. ASP.NET is still at least twice as slow as webgo.

FransBouma commented 11 years ago

We already do. There is a JSON serialization test and a Plaintext test. ASP.NET is still at least twice as slow as webgo.

There too, ASP.NET has 1.0% of the performance of Gemini (if I filter on 'full' frameworks only). Gemini 215,536 responses per second, ASP.NET a measly 2,187.

On the plaintext tests, it's even worse: Gemini scores 454,617 responses, and ASP.NET scores 1,828. I'm sorry, but are you seriously saying this massive difference is because ASP.NET is slow? I can only conclude based on these numbers, something is seriously wrong with the ASP.NET setup. I do that because e.g. the full stackexchange sites run on a full ASP.NET stack, and if the ASP.NET numbers would reflect real-life performance, they'd never manage to run even a fraction of their sites on the small server farm they're using.

ASP.NET might be slower than Gemini, but not that slow that it can barely manage to get 0.4% of Gemini's performance in the plain-text tests.

pdonald commented 11 years ago

What you are looking at is mono on Linux. To see Microsoft's ASP.NET results, click on Win under Hardware.

FransBouma commented 11 years ago

@pdonald I'm not a retard. I was referring to the main charts on the benchmark site, as that's what's linked to, that's what's discussed out there, and you know that. But if I compare EC values: JSon serialization on EC2: Gemini: Linux, 26440, and ASP.NET: Windows, ~ 3K, this means ASP.NET still has about 11% of the performance of Gemini.

Of course, they're not the same OS, but as many elements in the complete chain aren't equal, why would 1 extra element be a dealbreaker....

This thread IMHO shows the benchmark owners don't really show much concern about the ASP.NET numbers. You're not obligated to, of course, don't get me wrong, but IMHO if this benchmark is meant to have any real value, you should care and make sure it's fair so the obvious conclusions people will draw from it will indeed have value and they can base decisions on them (because people will do that).

pdonald commented 11 years ago

What makes you say that we don't care about ASP.NET numbers?

This is a quote from the OP:

Anyway, what a disappointment: we did all that work profiling and making improvements that caused real performance gains in our test setup and now those gains don't show up in the real run.

make sure it's fair

Can you please take a look at these files and tell us what's not fair about them?

Any suggestions on how to optimize the ASP.NET code and also IIS configuration are welcome.

FransBouma commented 11 years ago

(I'm not an ASP.NET expert)

What makes you say that we don't care about ASP.NET numbers?

If the ASP.NET perf is < 1-2% of Gemini (all Linux/mono ASP.NET numbers), I don't really think one can take those seriously, yet they're in the primary list of numbers on the benchmark's website. I'm not saying you did anything on purpose to make it look bad, not at all, I just find it odd you left them there, while they're clearly are beyond nonsense. I have no idea of the other frameworks at the bottom of the pile are equally nonsense or not, I have no experience with those frameworks. But as I said above in my long post: the benchmark itself has no real value: moving from mysql to pg gives 30% performance decrease alone, which already suggests something is off: the end results show the performance of another part than the one the average reader will think of, otherwise moving from one db client to another doesn't make a 30% overall (!) perf drop.

Looking at the code again, I can't really see bottlenecks, as I already said above: perhaps here and there some cycles can be won but IMHO in the end this is marginal. The only thing which is perhaps a bit of a problem is the threadpool manipulation. The thing is that the more threads are added to the pool, the more context switching is performed and the least performance is gained overall, especially if the same processors have to deal with DB logic as well (unclear whether the DB is on another box or not).

The plain-text benchmark is interesting. It should come down to simply requesting plaintext.aspx (single page) and getting a string back, which is a static operation, considering ASP.NET generates code to handle these requests statically. Therefore I doubt many threads are required to keep the CPUs busy as the request to return a simple string should be handled very quickly.

But I'm not an ASP.NET perf expert, e.g. having requests queue up and use 1 thread per logical proc might run into a HTTP/500 error, I don't know.

I haven't heard back from the ASP.NET team, so I don't know whether they've time to help, but at least they're notified so they should be able to shed more light on this problem.

bhauer commented 11 years ago

@FransBouma Thanks for your thoughts and for looping in some more ASP.NET subject matter experts. We appreciate any input and tuning that can be provided. Despite what you may think right now (and I hope to slowly convince you otherwise), we want every framework and platform in the project to run optimally. The overall quality of the data is not as high as it could be when there are test implementations that do not represent the framework's true potential.

An important caveat is that, to a degree, we resist the force that exists in benchmark projects to tune implementations to the particular needs of the benchmark tests. We've attempted to inculcate a notion of a "realistic implementation" versus a "stripped implementation." And this may in part explain @MalcolmEvershed's opinion that his recent contribution was a stripped implementation.

Obviously no benchmark will ever be perfect, especially one comparing such a diverse spread of full-stack frameworks, micro-frameworks, and platforms. I feel we have been reasonably diligent in providing the necessary classification and filtering tools to allow readers to digest the data in a form compatible with their evaluation process. Furthermore, we have attempted to be transparent about the project, but also welcome your continued criticism.

I want to repeat something I said earlier as an important caveat before the meat of this reply. @MalcolmEvershed and @pdonald have contributed a great deal of effort to the .NET tests that exist today. Without them, those tests would not exist in this project. We (by which I mean TechEmpower) have a small number of .NET developers, but none have been available to work on this project, so we had to ask the developer community to assist in getting .NET coverage included. These contributors invested the time to learn our toolset, then built the test implementations, built the necessary scripts, and submitted the pull requests. Subsequently, they have engaged to review the implementations further.

I will also repeat that the latest published results for .NET (and others) have made it clear that we need to improve the project's Python scripts to make it easier for contributors to smoke-test their contributions. Weaknesses in the present toolset lead to what I described above as an unacceptable situation where in many cases contributors submit pull requests and cross their fingers that the next round's data comes out better than before. I want to improve our toolset so that contributors can be made confident before submitting the pull request.

It is apples vs. oranges.

Yes, it is. It is many apples, many oranges, and also many pineapples, some pears, and a few bananas. I don't mean to make light of your point, but the produce-section analogy seems to work as well as any other. The coverage of this project is quite wide. As I've said before, that is in great deal thanks to the contributions of many frameworks' developers or fans. We (again, by which I mean TechEmpower) contributed an original ~20 implementations and the ~50+ since then have been community-contributed.

The point I made earlier is that the "apples versus oranges" description is valid, not just as a cliche but also as a metaphorical device. If we can step back and say, "yes, these are both fruit" ("yes, these two things both respond to HTTP requests") then there is some merit in comparing them.

...you'll see that ASP.NET has 1.1% of the performance of Gemini...

We have included Gemini in the project based purely on our prerogative as the creators of the project. It's not open source (although some of us want to change that in large part out of a spirit of fairness to this project; see elsewhere). Yes, we are proud of how it performs, but this project is not about Gemini.

I encourage readers to hide it if it helps digest the data, as seen in the below URL:

http://www.techempower.com/benchmarks/#section=data-r6&hw=i7&test=db&c=1&f=zik0zi-zik0zj-35r

For the sake of this discussion, I'm going to assume Gemini is hidden from view (or wasn't even included in the project) because I worry that some of your points speak about Gemini but are actually about other more general reservations you have.

For example, you literally wrote:

Are you really going to say that ASP.NET is that slow, that it can barely manage 1.1% of the performance of a Java stack?

With Gemini hidden, the top full-stack performer on the single-query test is Revel. Rewriting your statement:

"Are you really going to say that ASP.NET is that slow, that it can barely manage 1.7% of the performance of a Go stack?"

I don't do this to put words in your mouth but to demonstrate what I believe to be the case--this isn't about Gemini or even Revel--but rather about the performance results we have for ASP.NET.

Before we received .NET implementations and were able to measure their performance, my instinct was that the performance would be slightly less than Java on Windows. I would have guessed perhaps 75% the performance of a full-stack Servlet framework, let's take Wicket for example.

After capturing data, I was surprised at the considerably lower numbers. I personally am not sufficiently well-versed in tuning IIS, Mono, or .NET as a whole to identify problem areas or come up with an explanation. My conjecture to-date has been simply that:

And to reiterate, @MalcolmEvershed and @pdonald have investigated the performance of the test implementations and have shared many of their findings and thoughts (they are quite detailed!) elsewhere on this issue tracker.

Take the 1-query test, the one which is the default...

As an aside, I would entertain arguments for changing the default view. When we launched the stand-alone results web-site, I selected the single-query test on Linux i7 as the default because (a) we had very wide framework coverage in this test type and (b) this seemed to be the most popular test in reader conversations.

I would still like the default to be a test with wide coverage. Perhaps the JSON serialization test would be preferred?

When I look at the tests for the various languages, only the client and the DB are equal (and sometimes even that's not the case!). All other blocks/parts are different, or omitted. This means that you're really comparing apples vs. oranges.

Very much so. However, this test is about performance of a web application form the perspective of the client. From the client's perspective, the client has issued HTTP GET requests and received responses from a huge variety of servers. The response always looks roughly the same (with a few minor variations such as the Server header). The big variable is how quickly the server produces the response.

We have intentionally left the implementation details fairly open-ended, though certainly not completely. You can read the specifications here:

http://www.techempower.com/benchmarks/#section=code

For example, many test implementations do not use an ORM at all, and are therefore identified as using "raw" database connectivity. To provide one more level of clarity, I introduced a notion of a "micro ORM" to differentiate lighter-weight ORMs such as the one in Gemini or those popular with Clojure developers versus a full ORM such as Hibernate, ActiveRecord, or Entity Framework. It's admittedly a continuum and possibly a fools errand to attempt to classify ORMs, but I think most readers trying to be objective would roughly agree with how we have classified the ORMs. (And again, we're open to correction!)

If I look at the ASP.NET code, I can't really find an area where code is likely to be slow. Perhaps a few cycles can be gained here and there, but that's minor.

That is, in a manner of speaking, the point. This project is an attempt to capture the realistic performance of a wide range of web application frameworks and platforms when executing the fundamental building blocks of web applications: request routing, header processing, database connection pooling, object-relational mapping, object instantiation, dynamic collections, string concatenation, sorting, XSS countermeasures, server-side templates, JSON serialization, etc.

If the ASP.NET code looks mostly correct, that's how we want it to look. We want it to be code that a rational ASP.NET developer would say in its review, "I could see myself writing that if I were asked to query a database table, add an item, sort the results, and render a simple HTML response."

Of course no one is ever going to build a web application that does precisely what we are testing. But what we are testing are stand-ins for commonplace operations in real-world applications.

It's also not the area where the slow performance is located. We don't have to look at ASP.NET code to see what I mean, just look at the cpoll-cppsp code on MySQL (144,711) vs. cpoll-cppsp on PostgreSql (79,562). That's a 30% drop in performance by simply switching to another DB client, low-level API and DB. 30%! I.o.w.: the time spend in the web framework is minor compared to the time spend in the DB Client, low-level API and DB, because otherwise a switch of DB client wouldn't make such a difference. (a 10% speed gain in a part which takes 30% of the overall execution time doesn't make any difference on the overall execution time).

Yes, for the database tests, a significant chunk of time will be spent in the database drivers, the connection pool manager, the object-relational mapper--the pieces of the stack that are responsible for working with an external database server.

There are two test types that do not involve a database server at all.

JSON serialization test on full-stack open source frameworks (Linux i7):

http://www.techempower.com/benchmarks/#section=data-r6&hw=i7&test=json&c=1&f=zik0zi-zik0zj-35r

Plaintext test on full-stack open source frameworks and platforms (Linux i7):

http://www.techempower.com/benchmarks/#section=data-r6&hw=i7&test=plaintext&f=zik0zi-zik0zj-35r

(We don't have as many implementations of the relatively new plaintext test, so I am including platforms and micro-frameworks in the above link.)

Let's focus a moment on the JSON serialization test. Note that ASP.NET is at 1.5% of Revel. This is a test that does not even use a database connection. Being interested in seeing every framework perform as well as possible, I am excited by thoughts about tuning the MySQL .NET driver or in the upcoming tests that use SQL Server. However, we can't overlook the disappointing ASP.NET results in the JSON test that won't be affected by any changes to the database connectivity. From my point of view, the JSON serialization and plaintext tests establish high-water marks. You can never expect the database connectivity test to exceed the plaintext or JSON serialization test.

So the JSON serialization test for ASP.NET on Linux i7 is 2,187. And the single query test for Revel on Linux i7 is 66,690. Something more than database connectivity is at play here.

I'm sure ASP.NET isn't the top performer, but I'm also pretty sure it's impossible to shine in this benchmark unless the DB client and low-level API code used by the framework are top-notch. 1 glance on the MySQL C# provider code and one knows this won't be a winner.

I agree, and I think @MalcolmEvershed would too because he's looked at it and rendered a similar opinion. Also see Malcolm's work with SQL Server (issue #336). But bear in mind my point just above. Will ASP.NET with SQL Server execute the single-query test faster than it executes the JSON serialization test? Not in this universe. Something else has to give to allow the response rate to exceed that ~2,200 high-water mark.

IMHO, if you want to benchmark the web frameworks, you should use in-memory objects and return those as JSON

In a small way, the JSON serialization test is that. However, the next test type will exercise the use of cached in-memory objects similar to the current multiple-query test. See issue #374.

I can only conclude based on these numbers, something is seriously wrong with the ASP.NET setup.

See the two conjecture points I made at the start of this reply. I would be delighted if there is a configuration tweak that we can apply to improve the performance. But if that never arrives and we are to become comfortable with the results, my conjecture is that the request-processing overhead of IIS and Mono is in fact that high. I am optimistic that the appearance of the .NET CLR will be improved when we can add a computationally-heavy test type.

I was referring to the main charts on the benchmark site, as that's what's linked to, that's what's discussed out there

This is a no-win. Originally I rendered every single chart across all platforms on a single page. It was unruly and no one was happy. I now select Linux i7 as the default view because it sees, by far, the widest coverage of frameworks and is the most interesting to a majority of readers.

This thread IMHO shows the benchmark owners don't really show much concern about the ASP.NET numbers

I agree that we (again, TechEmpower, not the other contributors) have not invested much time in tuning the ASP.NET numbers. We have asked the community for its help there. I will also go further and say that I know that contributors have been frustrated by the immaturity of the Python scripts/toolset and we do want to improve those. But every contributor is facing that same frustration (to a lesser degree, of course, since the scripts were designed to run on Linux). (As an aside, not that it matters, I want to point out that I am a fairly avid Windows fan with a Surface Pro and Lumia 920; I mention this only to dismiss any nascent feelings that I am a Linux fanboy.)

That said, we do care about the ASP.NET numbers being correct.

My point of view is precisely the opposite of yours. This thread--an issue at GitHub--goes to show that despite our own inability to spend the time necessary to become .NET tuning experts, we want subject matter experts to contribute.

I want your input and the input of experts. If you can show us something that improves the performance of ASP.NET, I will be delighted by the pull request.

I just find it odd you left them there, while they're clearly are beyond nonsense

I don't find them nonsensical. As I said earlier in this reply, the numbers are not what I expected. But they are the actual data reported by the load simulation tool that tested every other framework in this project. I similarly was a little surprised by the low performance of several PHP frameworks and Rails. Our "Round 1" blog entry in fact makes a point of our surprise at the wide spread of results. Nevertheless:

moving from mysql to pg gives 30% performance decrease alone

I'm not sure why you consider this an indicator of invalidity. To my eye, this just suggests that the MySQL stack (database driver, connection pool, and the database server itself) is approximately 30% quicker than Postgres at simple small-payload queries. This is expressly not a database benchmark, but we have included a small number of popular databases at reader request. The variability is to be expected.

Flipped around, do you assume Postgres would be equal performance or faster than MySQL? If so, why? Conventional wisdom, after all, is that Postgres is slightly slower but one selects Postgres for a project because it's extremely battle-hardened with a great deal of design emphasis on reliability.

I haven't heard back from the ASP.NET team, so I don't know whether they've time to help, but at least they're notified so they should be able to shed more light on this problem.

And again I thank you for engaging both for yourself and in seeking the assistance of other subject matter experts. We had a variety of reasons for initiating this project, but one of them, and a very important one, was that we wanted in a small way to help all web sites' performance improve. If this project lights a small fire under a few frameworks to put a bit more effort into performance tuning, we're happy!

MalcolmEvershed commented 11 years ago

First of all, I didn't work on the Mono tests, so I don’t know anything about that.

I empathize with the remarks of @FransBouma. I do agree that many readers will be quick to dismiss some frameworks because of a poor showing in the chart, even though the reasons may be due to an apples-to-oranges comparison or poor test quality. "Winning" the chart is basically an exercise in how much overhead you can remove from the pipeline, quality of all the components in the pipeline, and expertise level of test implementor. To some degree it is quite random and it does make it impossible to do apples-to-apples comparisons.

Regarding the mysql->pg 30% perf drop: It is clear what this likely means: the cpoll-cppsp test has such low overhead that most of the time per request is spent in the database driver, so differences in the database driver can cause massive performance differences. The reason this sounds crazy is because we all envision a world where the database driver is a small factor, but that does not match reality if you profile a real database driver and see what it is doing. Profiling can be truly shocking. I dare you (in a friendly way) to profile any platform and to not be shocked. :-)

Regarding the threadpool manipulation: That was implemented for Round 6, but one can see that the Mono ASP.NET numbers aren't that different from Round 5 to Round 6, so I don’t think it is a big factor. I’m open to suggestions though (disable the threadpool manipulation on Mono?). Note that Windows ASP.NET seems to mitigate the context switch problem pretty well by only allowing a smaller number of threads to run simultaneously.

I do hope that the Round 7 can come soon so that we can compare the SQL Server, aspnet-stripped, and HttpListener results to the existing ASP.NET full-stack results.

garvincasimir commented 11 years ago

If this project lights a small fire under a few frameworks to put a bit more effort into performance tuning, we're happy!

@bhauer I just want to say that the above as an ultimate goal makes this project very useful to people like me. It is hard to avoid chiming in with a passionate rebuttal when your choice framework is way to the bottom. Emotions definitely come into play when you visit a site and see the charts which seem to say your framework sucks.

As an asp.net guy myself I have been working on performance improvements for a client application and come to the same conclusions expressed by you and others. That is, aside from the code, there are many factors which contribute to less than expected results.

I am in the process of creating a plan to test for performance bottlenecks in my client application so the asp.net discussion is very interesting to me. To add to that discussion I would like to suggest asp.net startup time as a possible explanation for the disappointing results.

I would suggest as an improvement to the benchmark tool you factor out the results which come back during startup time for all frameworks. As a statistic averages can be very useful but they don't always lead you to the conclusion you should in each situation. This is probably true when hitting an asp.net server on startup at the micro second level. If my assertion about startup time is correct, when you plot the results of the asp.net requests over time you should get a graph which shows a sharp curve trending upwards. I believe the benchmark recording should begin at the point this graph flatlines.

garvincasimir commented 11 years ago

So after doing some more reading I did find this in your questions list:

"Do you run any warmups before collecting results data?" Yes. Every test is preceded by a 30-second warmup and brief (several seconds) cooldown prior to gathering test data. This allows runtime-optimizing platforms such as those that use a JIT an opportunity to comprehend the benchmark implementation

Can you elaborate on what happens during the warmup? Do you send any requests?

As an alternative to the hard 30-second warmup time how about altering the benchmark runner to start testing when the slope of my previous mentioned grapth nears zero?

bhauer commented 11 years ago

@garvincasimir Thanks for the feedback. We really enjoy hearing that people have found value in the project, so your words are really appreciated!

As for the warm-ups, they are running the same test that is about to be captured at 256 client-side concurrency. So for example, the single-query test is captured at 8, 16, 32, 64, 128, and 256 concurrency. Before any of those are captured, we run the 256-concurrency test but throw away the results in order to warm-up the framework.

I do need to revise that answer you cited, however, since we trimmed each run time to 15 seconds since the number of tests permutations continues to grow.

You can see the behavior in the raw captured results from Wrk. For example, here is the Yesod framework's raw results:

https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/results/i7/20130619104939/db/yesod/raw

You'll see a 5 second primer ("is this framework working?") and a 15-second warm-up before concurrency level 8 is run.

MalcolmEvershed commented 11 years ago

I profiled the ASP.NET test runs and I don't think there's any data to suggest that startup time or JIT is a big factor, especially considering the warm-up. The big issue is per-request overhead which there is a lot of in ASP.NET. I tried to remove a lot of it with the aspnet-stripped tests, but even with that, it's still a lot compared to the barebones platforms. Hopefully the TechEmpower guys can add the aspnet-stripped and HttpListener results soon to appease the community. :-)

Of course, I could be wrong, so I welcome corrections.

garvincasimir commented 11 years ago

@macolmevershed what kind of tests did you run to come to the conclusion that startup time is not a factor? I am not doubting you but rather curious.

MalcolmEvershed commented 11 years ago

@garvincasimir Sure, you ask a perfectly reasonable question. The reason I think that startup time is not a factor is because:

  1. One can see the code for the warmup run in framework_test.py here. In here you can also see how the warmup is done, then the real run is done.
  2. One can see the output log for a full test run here and one can see that the warmup is done and then the real runs are done and it doesn't seem like the AppPool was restarted in between.
  3. If the AppPool was restarted, that logic would have to be in the aspnet directory, but I don't see anything like that. There's just setup_iis.ps1 here, which is run at the beginning and end.

Also, when I was working on improving the tests, I would start/stop the AppPool manually, do initial requests, then do performance runs and the results approximately matched using the full test infrastructure. If the startup time was a big factor, the results wouldn't match so closely.

But I could be wrong, so I'd appreciate corrections. Thanks.

FransBouma commented 11 years ago

There's no directive in the web.config file that it should be release build. Although code is compiled at the command line, there's also code generated on the fly by ASP.NET, which is compiled using the directive in the web.config file. There's no directive that it should be a release build, so IMHO it will use a debug build for that code, which will make it very slow.

Scott Hanselman on twitter also said one shouldn't mess with the # of threads in the threadpool. See conversation: https://twitter.com/shanselman/status/365937061527687168 https://twitter.com/shanselman/status/365937094532661249 https://twitter.com/shanselman/status/365935957473624064

about web.config compilation directive: https://twitter.com/CamBirch/status/365879641036177408 https://twitter.com/craigstuntz/status/365883039789752323

AppPool recycles can happen also due to e.g. reserved memory is full, or other thresholds which are set by default. You only know that they're not recycled if you specify with the appPool that they should be taken off line (so you'll get a 500 server error) if they recycle.

Perhaps this is the cause of the tremendous latency for ASP.NET that's recorded, which is massive.

pdonald commented 11 years ago

Scott Hanselman on twitter also said one shouldn't mess with the # of threads in the threadpool.

Before messing with the # of threads, it was just as slow.

Personally, I'd like to remove that piece of code. But since it increases performance by 13%, my argument that it doesn't look nice is not a good one.

There's no directive that it should be a release build, so IMHO it will use a debug build for that code, which will make it very slow.

debug Optional Boolean attribute. The default is False. http://msdn.microsoft.com/en-us/library/s10awwz0(v=vs.100).aspx

Originally I'd left debug mode enabled in web.config and Round 5 was benchmarked before my pull request that fixed it was accepted so you could say Round 5 results were invalid (assuming Round 5 wasn't rerun, I don't remember if it was). Now, in Round 6, when the debug flag is removed, the JSON test is actually slower than in Round 5 and the Fortunes test (the only test which should have been affected) is just marginally faster.

MalcolmEvershed commented 11 years ago

Scott Hanselman on twitter also said one shouldn't mess with the # of threads in the threadpool. See conversation: https://twitter.com/shanselman/status/365937061527687168 https://twitter.com/shanselman/status/365937094532661249 https://twitter.com/shanselman/status/365935957473624064

I have a lot of respect for Scott and I agree that it would be preferable not to mess with the thread settings, but the reason I did that was because it improved the results (because ASP.NET doesn't grow threads fast enough [by default] from idle given that wrk will throw a ton of requests at it). If we remove the thread settings, I think that the Windows ASP.NET performance numbers will go down, is that what we want? If that is what more people desire, I can back out my change, but it should be clear what the result will be. And if I've made a measurement mistake, please correct me.

AppPool recycles can happen also due to e.g. reserved memory is full, or other thresholds which are set by default. You only know that they're not recycled if you specify with the appPool that they should be taken off line (so you'll get a 500 server error) if they recycle.

Perhaps this is the cause of the tremendous latency for ASP.NET that's recorded, which is massive.

Ok, so you're saying that maybe a recycle is happening during the run? If you have a pull request to prevent this from possibly happening, that sounds good to me. Sounds reasonable.

Thanks.

bhauer commented 11 years ago

@MalcolmEvershed Without having reviewed the specifics, I'd prefer to leave the thread settings as-is until someone can furnish evidence that counters your observations that increasing the threads helps performance or that such tuning is strongly discouraged with ASP.NET/IIS.

For the time-being, the remark about thread tuning seems fairly off-the-cuff.

Other frameworks have seen their various worker-thread pools similarly increased versus the defaults. It seems surprisingly common for "production-class" defaults to be set very conservatively.

garvincasimir commented 11 years ago

Thanks @MalcolmEvershed it sounds like you have done your due diligence where the startup issue is concerned.

Has anyone looked at the asp.net performance counters during the benchmark run?

MalcolmEvershed commented 11 years ago

Has anyone looked at the asp.net performance counters during the benchmark run?

I haven't personally. Let us know if you find something cool. :-) Thanks.

garvincasimir commented 11 years ago

Yeah I definitely need to get the environment setup. Maybe contribute to the azure effort since I am hosting there.

FransBouma commented 11 years ago

@pdonald If debug builds are marginally faster in some cases, something IS wrong: either the time spent elsewhere in the pipeline is so much higher than the time spent in the code (so it's irrelevant) or the current builds are actually debug builds. But as the default is release build, then I don't know what makes it perform as slow as a debug build.

@MalcolmEvershed The AppPool recycles are related to how they apppool in question is configured in IIS. So e.g. if the Virtual memory pool exceeds a given threshold, if the # of connections exceeds a given threshold etc. I have no idea how it's set up, or whether there are settings defined to make it recycle in the first place. I also don't know whether you use a classic apppool instance (ISAPI extension) or an integrated one (integrated in the request pipeline), what the limits are defined for CPU usage etc. See advanced settings for AppPool in IIS.

So (guessing here!) it might be there's a restriction set for the apppool instance to recycle at 100MB memory usage, and it hits that e.g. within 10 seconds, it will recycle no matter what.

MalcolmEvershed commented 11 years ago

The AppPool recycles are related to how they apppool in question is configured in IIS. So e.g. if the Virtual memory pool exceeds a given threshold, if the # of connections exceeds a given threshold etc. I have no idea how it's set up, or whether there are settings defined to make it recycle in the first place. I also don't know whether you use a classic apppool instance (ISAPI extension) or an integrated one (integrated in the request pipeline), what the limits are defined for CPU usage etc. See advanced settings for AppPool in IIS.

So (guessing here!) it might be there's a restriction set for the apppool instance to recycle at 100MB memory usage, and it hits that e.g. within 10 seconds, it will recycle no matter what.

@pdonald setup the site in setup_iis.ps1 with New-WebSite -Name Benchmarks -Port 8080 -PhysicalPath $wwwroot. On my machine, appcmd.exe output shows that it's using Integrated mode with the DefaultAppPool:

C:\FrameworkBenchmarks>appcmd list site SITE "Default Web Site" (id:1,bindings:http/:80:,state:Started) SITE "Benchmarks" (id:{some id number},bindings:http/:8080:,state:Started)

C:\FrameworkBenchmarks>appcmd list app APP "Default Web Site/" (applicationPool:DefaultAppPool) APP "Benchmarks/" (applicationPool:DefaultAppPool)

C:\FrameworkBenchmarks>appcmd list vdir VDIR "Default Web Site/" (physicalPath:%SystemDrive%\inetpub\wwwroot) VDIR "Benchmarks/" (physicalPath:C:\FrameworkBenchmarks\aspnet\www)

C:\FrameworkBenchmarks>appcmd list apppool APPPOOL "DefaultAppPool" (MgdVersion:v4.0,MgdMode:Integrated,state:Started) APPPOOL ".NET v4.5 Classic" (MgdVersion:v4.0,MgdMode:Classic,state:Started) APPPOOL ".NET v4.5" (MgdVersion:v4.0,MgdMode:Integrated,state:Started)

The IIS docs suggest that the defaults don't do any recycling other than every 29 hr. I ran appcmd list config and looked for any explicit recycling settings and I didn't see anything.

So far I don't see anything suggesting that any recycling is going on. Suggestions? Thanks.

FransBouma commented 11 years ago

If the event viewer doesn't show anything, then I'm 100% out of ideas where the slowdowns take place.

FransBouma commented 11 years ago

I must say that I'm also very disappointed in the ASP.NET team. I asked them whether they could help, apparently they have better things to do. Replying short bits on twitter is easy, investing time to make things better isn't. I'm sorry @pdonald and @MalcolmEvershed I couldn't help things further. If I have time in the coming weeks I'll try to setup the code locally so I can check what's wrong, but I might also just give up, as MS themselves don't really seem to bother.

bhauer commented 11 years ago

Thanks for following up @FransBouma. We should try to not slight the ASP.NET team; they are busy with their priorities and this is an imposition on them. Any help they can provide would be greatly appreciated, but I don't expect anyone to drop what they are doing and come participate in this project.

Plus, they may still join in when they can find the time. So I want to be as welcoming as we can be.

It's also worth reiterating the two contributors here have already put in a lot of effort.

I am prepared to be surprised by a revelation if one is uncovered. But on the other hand, I'm also comfortable that the data is simply a window on reality. I am optimistic that .NET and Mono will pull away from their current position when we build a high-computation test type because I am fairly convinced that these are not computationally slow platforms. However, I have become comfortable with the idea that IIS and Mono may have fairly complex code-paths for their HTTP request routing and response delivery. It may simply be the case that they were not designed for rapid-fire small responses.

If you do have the opportunity to set things up and dig in some more, I look forward to hearing your findings!

pbooth commented 11 years ago

I think that this specific discussion is a great example of both the challenges and the enormous value created by a project like this benchmarks. I moved from application development to performance architecture about seven years ago. I quickly learned that many of my long held beliefs about performance were plain wrong. I think that as more people become acquainted with the results of these tests they will go through similar stages of surprise, disbelief, anger, denial and eventually acceptance. We should be prepared to repeatedly hear smart, threatened, programmers saying that results X, Y or Z are just plain wrong.

Some popular technologies are dog slow. This can be pretty threatening. I'm a Rails fan-boy and I squirm at the appalling results that Rails shows (similar to ASP .NET). Then I'm proud that the equally awesome Openresty is 50x faster. I've run the tests on my own hardware, so I know these results are real.

fernandoacorreia commented 11 years ago

It may simply be the case that they were not designed for rapid-fire small responses.

It seems to me that this is the case. That's what I gather, for instance, from reading Microsoft's rationale for project Katana, including:

There may be cases where the benefits provided by IIS are not required and the desire is for a smaller, more lightweight host.

And:

By breaking the traditional notion of a framework into a set of small, focused components which are added explicitly by the application developer, a resulting Katana application can consume fewer computing resources, and as a result, handle more load, than with other types of servers and frameworks.

ASP.NET was designed for different use cases. ASP.NET MVC and Web API were evolutionary steps. Now Microsoft envisions:

The Future: A Nimble Framework

So I'm very interested in comparing the full-stack performance of ASP.NET to other full-stack technologies, and also in comparing it with Project Katana's performance as it evolves.

All this, IMO, just highlights the truth in what @pbooth said. Anyway, I'm curious to see how ASP.NET will perform on Windows Azure where the full stack, including OS, drivers and hypervisor, are properly configured.

bhauer commented 11 years ago

@pbooth I find it fascinating and rewarding to see this kind of discussion in response to the project. I love that really thoughtful people such as everyone in this thread are engaging and helping improve the tests. Given your background, I am proud to have you as a participant.

@fernandoacorreia Agreed. Maybe we'll see someone contribute a test implementation on Katana for a future round.

I'm especially excited to see the Azure tests in Round 7. Thanks so much for all of your contributions!

LadyMozzarella commented 9 years ago

Closing this issue due to inactivity. ASP.NET has not been tested recently (windows not supported) and we're hoping that it will be tested in Round 11. Once round 11 is underway and we have some ASP.NET preliminary results, we can revisit ASP.NET's performance.