Enable cached test results

benbrittain commented 1 year ago

We have a few unit tests that take 60s+ to run, so I was fairly surprised when executing buck2 test //... repeatedly invokes those tests even though the inputs haven't been modified.

Is this working as intended? I know that FB has Tpx internally and I'm not sure if that's the infrastructure which is supposed to handle avoiding reruns or not.

stepancheg commented 1 year ago

This is how how buck2 works: it does not cache test results. Maybe it should.

TPX does different things, it can for example skip disabled tests. But it does not cache test execution results either.

benbrittain commented 1 year ago

That's too bad, cached test results are a pretty killer feature. Is this logic that belongs in the test runner or as part of Buck2 core? Basically, would it be possible for me to write a test runner that caches results?

stepancheg commented 1 year ago

Test runner does not have this information.

I think we should pass it to test runner regardless of who should filter unchanged tests, buck2 or test runner.

But after that, I think it should be buck2 native option like buck2 test --skip-unchanged.

alexlian commented 1 year ago

That's too bad, cached test results are a pretty killer feature.

This was a pretty contentious discussion internally between a number of teams. Ultimately, given Remote Execution's ability to cache, it was deemed something to at least consider. So, IIRC, we do internally have the option to opt-into caching results.

I do forget where this is implemented, and guess as you suspect that it'd be in the test runner. At best, it's buck2 core passing along the information to RE on behalf of the test runner? Perhaps @krallin might know or point towards someone that does.

rbtcollins commented 1 year ago

FWIW, a few folk were considering doing a spike to experiment with Buck2 - we like the focus on developer experience, LSP integrations and the like - but our existing Bazel setup depends quite heavily on not running tests with unchanged inputs in order to deliver our sub-minute builds...

How much work do you think one would be looking at to get a minimum viable patch for this?

ndmitchell commented 1 year ago

Note that you can write your own test rule which runs a command, writes out an artifact (e.g. the logs of the test) and then have your tests run at build time. Then you get everything that works on builds (caching, remote execution etc) for tests. The only thing you lose is the buck2 build/buck2 test distinction. Perhaps that would be the minimal viable patch, and should be entirely doable in custom Starlark rules.

vaskomitanov commented 1 year ago

This was a pretty contentious discussion internally between a number of teams.

@alexlian I'm curious what are the arguments against caching test results? Is it just because it can be done on RE side or there is something more fundamental?

alexlian commented 1 year ago

I'm curious what are the arguments against caching test results? Is it just because it can be done on RE side or there is something more fundamental?

IIRC, test signal fundamentals: what if tests results are flaky? I think internally we have an opt-in and usually only caching passing tests only, so failures will be always re-run. Also the fact that with RE action cache, it's sharing results across CI / developers. So, an incorrect signal has cost beyond the individual.

benbrittain commented 1 year ago

Note that you can write your own test rule which runs a command, writes out an artifact (e.g. the logs of the test) and then have your tests run at build time.

Even though we're pretty excited about buck2, I don't think we're willing to give up the build/test distinction and cached tests are pretty important for us.

Is there a somewhat reasonable path to getting something like this supported? If someone was to put together a patch that enables it's support in conjunction with RE would it be accepted even though it's a "contentious" feature? Would a feature like this need to be driven internally?

Peter-McLean-Altera commented 10 months ago

I'd like to voice my support for this issue. I come from hardware development land, where unit 'tests' take between 10s of minutes to an hour+. This is a major concern about adopting Buck2, as these long tests should only run if the inputs change. Before a release, every test could be force run, but in typical day-to-day development the time savings are necessary to decrease cycle time.

vors commented 7 months ago

I'm very surprised that it's not supported, it's part of RE API and part of what makes systems like buck2 and bazel appealing. Please consider implementing, this is a huge adoption blocker.

cjhopman commented 7 months ago

Yeah, I've always been on the side that the right thing to do is to cache test results, I'd love to see us internally move to that model. Maybe I'm biased from having test result caching like 10 years ago with bazel.

I don't think it'd be a particularly extensive change to support this, the model of test integration was done to support tests just being actions like any other (and so should be able to support both RE and caching like any other).

facebook / buck2

Enable cached test results #183