TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project
https://www.techempower.com/benchmarks/
Other
7.58k stars 1.94k forks source link

Lost history #4147

Open ghost opened 5 years ago

ghost commented 5 years ago

I'm looking for Lwan results in Round 10, which according to the blog was the top performer in many areas for that round.

I cannot find lwan anywhere in any of the charts of round 10 or any other rounds. You seem to have lost recorded history.

Also, Lwan is missing in new rounds. It was removed due to the code living outside the repo, right? Will it come back?

Currently Actix and Fasthttp top plaintext charts, which is kind of misleading given that in my own tests they score way below Lwan and other C servers.

NateBrady23 commented 5 years ago

Hey @alexhultman -

Thanks for opening this issue. lwan isn't in the current rounds because there isn't an implementation in our repo. As far as it being removed from previous rounds, I can only speak to not being able to review those implementations.

As far as it being misleading that there are tests that top the plaintext charts in your own testing, please feel free to create implementations for those tests. We can only test the implementations we have. I'd love to see the C servers that score way above! Also, we're totally open to lwan coming back with a test. I think we've pinged @lpereira a few times on that and last I heard he was working on the first major release. Would love to see it come back!

Also, there are fully transparent live results happening here: https://tfb-status.techempower.com/ so if you get some tests in, you'll be able to see how they stack up right away.

ghost commented 5 years ago

What I mean with the lost history is that your own blog: https://www.techempower.com/blog/2015/04/21/framework-benchmarks-round-10/

Say

[...]

For Round 10, Lwan has taken the crown.

[...]

Yet it's not there in the round 10 charts. Why did the old charts change?

lpereira commented 5 years ago

Yes, you're remembering it right, @nbrady-techempower. It's entirely my fault that the implementation never made its way to the TWFB repository.

As far as reviewing the implementation: it's been always possible to review not only the Lwan source code, but the benchmark harness too. The scripts (in this repo) that pulled in Lwan fetched a specific Git revision; due to the "tamper-resistant" nature of Git, the implementation was always reviewable and reproducible... maybe not as convenient as having it in the TWFB repository, but considering that the API is wasn't set in stone and no releases were made, it was the best I could do at the time. However, as they say, "when in Rome...", so, yeah, maybe someday I'll publish the benchmark harness in this repo. I got interested in Lwan again, at least. :)

(And, yes, I've been puzzled by the same thing, @alexhultman. Learned that old results were removed when showing the page to a prospective employer... kind of unsettling, as I was very proud of that result. No hard feelings, though.)

msmith-techempower commented 5 years ago

The entry for lwan was likely lost when it was removed. There is some complicated architecture to try and keep all the metadata the same from round to round even when reconciling things like name changes (casing, etc).

I can confirm that lwan does indeed exist in the Round10 data being processed on the page, but very likely needs a fix for the metadata (or equally likely, the benchmark renderer tooling) to display it again.

Ping @bhauer

michaelhixson commented 5 years ago

Here is what's happening:

There are multiple possible solutions to this. The last time we discussed this internally, we concluded the problem wasn't bad enough to be worth fixing. I guess I still feel the same way, though I don't want to discourage anyone else from fixing it. It probably has to be someone at TechEmpower to fix it, since all the solutions I can imagine involve changes to the results website.

NateBrady23 commented 5 years ago

@msmith-techempower Just to clarify, we're talking about lwan not ulib.

And yes, I totally forgot about the single metadata file.

So to correct my previous statement, we've removed implementations for violating rules in the past, but it doesn't look like that was the case with lwan. Usually if we do that, we will ping the framework maintainer or the person that implemented the test and let them know. We don't have any reason not to be fully transparent there.

Two possible remedies for this particular situation are:

1) Seeing if @bhauer can manually add lwan back to the metadata. Might take some time if he can because we're super busy.

2) Adding an lwan test back to the repo would put it back in the metadata file and since lwan keys exist in the round 10 results.json it would re-enable the visualization there.

Sorry for the confusion guys; certainly no harm intended! @lpereira glad to hear you're back on lwan though!

msmith-techempower commented 5 years ago

I've updated my first comment to be lwan, not ULib, and it is still correct.

ghost commented 5 years ago

@msmith-techempower So... are you going to fix the issue or no? Currently your round history is not consistent with your blog or... reality.

msmith-techempower commented 5 years ago

@alexhultman We have the fixes @nbrady-techempower mentioned above in mind, and will (given bandwidth) attempt to resolve the issue. I have no timeline at present.

boazsegev commented 5 years ago

@lpereira ,

An faster solution might be to re-add the lwan application to the benchmark and either:

1) Add a short bash script that downloads a specific commit version of lwan when using lwan with the benchmark (similar to what I did with facil.io, except I did it with git tags); or

1) Include the whole lwan framework source code in the TechEmpower benchmark app folder (which I would be against, since it will just slow down the forking and development process).

This will also have the benefit of lwan showing up in the latest benchmarks, which I would love to see :-)

Kindly, B.

@nbrady-techempower , @msmith-techempower , I love that you want to solve this. On the other hand...

I thought this was a feature rather than a bug.

I think it makes sense to curate the list by exposing only frameworks that survived and showing their performance as it changes over time.

Unless tests are removed or changed, obsolete code should run as is, so (IMHO) the only reason a framework should be removed is if it's no longer valid and shouldn't be displayed anyway.

Food for thought. Bo.

joanhey commented 5 years ago

With php, after some needed changes we removed php7-raw and now is php. Immediately we lost the history in all the rounds. :-1:

In some days we will remove php5, and will happen the same: no history ;(

I think that all the final rounds must be immutable.

Please try to solve this issue. Thank you.

robjtede commented 1 year ago

I think that all the final rounds must be immutable.

I would agree with this sentiment since we've just found out that r20 results for Actix Web have been wiped. We link to this page from our README which now says that results for most tests are missing.

(Note that we're holding off on updaing the link to r21 until some issues are worked out with the composite methodology.)