Start building benchmarks, gradually

rrnewton commented 8 years ago

There has been discussion on ghc-devs about revitalizing the Haskell benchmarking suites. Stackage benchmark suites would appear to be one useful source of data. Indeed, there are almost 200 benchmark suites among the ~1,700 current LTS packages.

However, running Stackage benchamarks is currently hard to achieve, for several reasons:

Tarballs generated from cabal sdist are often wrong (see pipes, lts-5.13 which is missing a file)
There is no general way to map from a Hackage release back to the original source code commit.
Stackage doesn't enforce that any benchmark suites build, much less run. (It does, however, build the dependencies of benchmark suites, except for a list that are skipped.)
As a result, even popular packages like lens-4.13 will error when you try to build their benchmarks.

While I understand that benchmarks were left out of the original stackage focus, is it perhaps time to start tightening the screws and weeding out these invalid tarballs?

Perhaps a first step would be a warning mechanism. The maintainers agreement has some hard deadlines, but perhaps failing benchmark suites builds could only be a soft warning for some period of time, and then later become a hard requirement along with the rest.

(CC @RyanGlScott @vollmerm @osa1)

juhp commented 8 years ago

I think it is an interesting idea: to try at least building the Benchmarks in stackage. Of course it will add to the maintenance burden and build-times. I dunno if it could be started off as a separate effort before integrating to Stackage itself - it would be good at least to do some initial testing first to get a better idea of how good/bad things are currently.

phadej commented 8 years ago

I tried to build some Subaru of benchmarks around Christmas, many depend on old criterion. So even notifying maintainers about restrictive bounds would be nice

rrnewton commented 8 years ago

@juhp We will gather data on how many of them already build, and report that here. I hope that just building benchmarks does not add too much to the build time because Stackage already builds benchmark dependencies.

There may be a substantial one-time transitioning cost -- getting people to fix currently-broken packages. But I hope that, long term, building benchmarks won't add too much friction, because it doesn't increase the "surface area" of maintainers and packages. Rather, it's just one more correctness check packages must pass to be included.

DanBurton commented 8 years ago

So even notifying maintainers about restrictive bounds would be nice

As I understand it, we do notify maintainers about restrictive bounds on benchmarks, just the same as with their regular dependencies. But once they get put on the "skipped benchmarks" list, we stop notifying them.

snoyberg commented 8 years ago

I have no problem with turning on benchmark building. We'll just need to populate a field for expected benchmark failures pretty quickly. I'll make the tweak. @juhp I can wait to activate this until I'm back on curator duty if you'd like

phadej commented 8 years ago

@DanBurton oh, I didn't know! That's very nice.

juhp commented 8 years ago

@snoyberg awesome! Well we can try it now I guess and see how it goes. I see benchmarks are building now actually - I can try to report the initial results here when that finishes.

snoyberg commented 8 years ago

Yeah, sorry, I moved ahead with it right away to test out my code. If it causes you trouble, let me know and I'll roll it back.

On Thu, Apr 21, 2016 at 11:08 AM, Jens Petersen notifications@github.com wrote:

@snoyberg https://github.com/snoyberg awesome! Well we can try it now I guess and see how it goes. I see benchmarks are building now actually - I can try to report the initial results here when that finishes.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/fpco/stackage/issues/1372#issuecomment-212797670

snoyberg commented 8 years ago

The following benchmarks appear to be failing:

Frames
attoparsec
bzlib-conduit
cacophony
carray
cipher-aes128
cryptohash
dbus
effect-handlers
fast-builder
gitson
hashable
http-link-header
idris
jose-jwt
lens
lucid
mongoDB
mutable-containers
picoparsec
pipes
psqueues
rethinkdb
stateWriter
streaming-commons
thyme
vector-binary-instances
vinyl
warp
web-routing
xmlgen
yesod-core
yi-rope

Not pinging maintainers on this (though I'll look into a few of those myself).

juhp commented 8 years ago

thanks that's great

juhp commented 8 years ago

Can we close this now?

snoyberg commented 8 years ago

Yes, I think so

rrnewton commented 8 years ago

Just FYI we are now successfully grabbing and running >66 benchmark suites and recording their results. We've got various build errors that don't 100% match up with the list above, and we can gradually help poke maintainers or send PRs.

Many of them are simple fixes like missing other-modules.

juhp commented 8 years ago

@rrnewton Thank you. If you do want to create an issue to track progress you can do that, otherwise I guess the current Stackage status can be checked in the "Expected benchmark failures" section in build-constraints.yaml at least.

RyanGlScott commented 8 years ago

I'm currently hunting down Stackage benchmark failures separately in this issue. As these packages get fixed and uploaded to Hackage, I'll submit PRs to this repo to add their benchmarks back into the fold.

commercialhaskell / stackage

Start building benchmarks, gradually #1372