How should we handle testing

epage commented 5 years ago

Considerations

Emitting JUnit reports
Emitting coverage (gathered in which way?)
Maintaining support for cargo test's features that are relevant for CIs

Possibilities

Contribute JUnit support to libtest
A replacement for cargo test
Separate command that converts json to JUnit
Task to wrap all of this

epage commented 5 years ago

@johnterickson on gitter

1) I think "cargo test" maps decently to JUnit (which seems to be the most broadly supported), but I'm less sure about "cargo bench".

2) I'm "on the fence" on whether libtest should emit it's own JSON or whether JUnit should be baked in. A confounding factor is that the libtest formatters are used for both "cargo test" and "cargo bench"

3) I'd like to avoid wrapping cargo with another executable, though I think it would make sense for a "cargo test" build task in Pipelines to also publish the test results to Azure DevOps

~~4) I'm relatively new and not running Rust in production, so I'm open to hearing other thoughts and also do take my opinions with a grain of salt :)~~

(doesn't matter, your thoughts are welcome!)

epage commented 5 years ago

I think "cargo test" maps decently to JUnit (which seems to be the most broadly supported), but I'm less sure about "cargo bench".

From a CI perspective, a lot of people give flack for running cargo bench from within a CI due to the variability Sadly, my google-fu is failing me and I can't find the blog post where someone analyzed this. (personally I still do it)

From a completeness perspective, it is probably a problem to have an output format that doesn't support all of the events.

Hmm, thinking about this. We could treat each bench as a test case and store the bench information in case properties. That way we can at least say we support it, even if people can't act on it.

(I still feel like a benchmark reporting service is something lacking in the wider programming community)

I'd like to avoid wrapping cargo with another executable, though I think it would make sense for a "cargo test" build task in Pipelines to also publish the test results to Azure DevOps

For cargo test to be programmatically useful, we effectively need to replace it from what I've seen. If junit support was implemented via emitters, then it'll be in the output, mixed with other text making it unclean to extract. If we use escargot, we can cleanly build and test, provide a nice UX for the viewer, and support outputting the relevant test data as junit.

cargo-suity effectively replaces cargo test. Especially after it migrates to escargot, it will be calling cargo build on the tests and then invoking the test executables directly. I suggested renaming it. My vote is actually cargo-ci-test (cargo ci-test) and have its scope be for a CI-focused version of cargo test, helping with needs like coverage, etc.

ghost commented 5 years ago

If junit support was implemented via emitters, then it'll be in the output, mixed with other text making it unclean to extract.

This is a great clarifying point that reminds me of https://github.com/rust-lang/rust/issues/57147. Zooming out for a second, i wonder if it's a hard requirement of libtest that the test output go to stdout. I think you make a good point that there will be too much mixing in stdout. Seems not very "rustic" to be having to grep from stdout mess.

Perhaps libtest should have: 1) Formatters for showing cargo test/bench to the console (e.g. with console coloring). These would be streaming as tests complete (pretty, terse) 2) Formatters for writing results to a test log file. This need not be streaming. (e.g. JUnit has a count at the top IIRC)

This would also solve the question of what to do about cargo's current "JSON-per-line" - specifically we could deprecate

My vote is actually cargo-ci-test (cargo ci-test) and have its scope be for a CI-focused version of cargo test ...

My thought against a wrapper (not having tried suity or escargot (whose name I love btw)) Is more about having a way to generate programmatic test output out-of-box. That is, running tests in a CI shouldn't need me to "cargo install" something first. I'll have to try out suity and escargot!

epage commented 5 years ago

My thought against a wrapper (not having tried suity or escargot (whose name I love btw)) Is more about having a way to generate programmatic test output out-of-box. That is, running tests in a CI shouldn't need me to "cargo install" something first.

Understandable. Maybe I'm just being pessimistic but I expect we'll have quite a few thins to cargo install as we evolve the ecosystem. Like leveraging crates instead of having a "batteries included" stdlib, I favor us iterating on a lot of these tools outside of cargo. We get them sooner and can more easily evolve them as we better understand the problem.

Speaking of, external tools might also be an important part of getting features integrated into cargo. if we have use cases to show we are solving along with a mature implementation, I expect that'll go through a lot more smoothly.

Also, to be clear, to avoid the performance overhead of literally doing cargo install, I've been encouraging people to upload binaries to github releases in a way compatible with trust's install.sh. GhInstall is the other task that I think that'd be useful. See #3

nickbabcock commented 5 years ago

From a CI perspective, a lot of people give flack for running cargo bench from within a CI due to the variability

Criterion has a section about that. It doesn't give numbers, but basically says if you run cargo bench in a CI pipeline, it shouldn't fail a build.

Speaking of Criterion, if we do support benchmarking in CI (of which I don't do but won't inhibit others), Criterion should really be the default / recommended. The fact that it outputs csv benchmark data (in addition to its other features) lends itself even more to a CI use case, as one could track benchmarks across all builds.

For tests, from a UX perspective, I think I'd naturally gravitate towards seeing two steps "1. Run tests. 2. Upload test results / (3.?) code coverage" to differentiate potential failures, but I'm flexible. As long as any potential wrapper / tool is as solid as cargo test then it doesn't matter to me what's underneath. I wouldn't want to see a step in the UI called cargo suity as it seems best to name a step after the semantics (hmm maybe not the right word) and not the implementation.

epage commented 5 years ago

Speaking of Criterion, if we do support benchmarking in CI (of which I don't do but won't inhibit others), Criterion should really be the default / recommended. The fact that it outputs csv benchmark data (in addition to its other features) lends itself even more to a CI use case, as one could track benchmarks across all builds.

I'm on the fence on Criterion atm. Without the new testing RFC implemented and stable, it is error prone to maintain the list of benchmarks. I talked to them about adopting macros for registering tests but they want to just wait until the testing RFC is in.

See https://github.com/bheisler/criterion.rs/issues/262

Otherwise, I think we're agnostic of the use of criterion.

For tests, from a UX perspective, I think I'd naturally gravitate towards seeing two steps "1. Run tests. 2. Upload test results / (3.?) code coverage" to differentiate potential failures,

Let me see if I understand. Is (3) the uploading of coverage or re-running the tests to collect coverage data?

If the latter, the downside is we're doubling our testing time.

but I'm flexible. As long as any potential wrapper / tool is as solid as cargo test then it doesn't matter to me what's underneath. I wouldn't want to see a step in the UI called cargo suity as it seems best to name a step after the semantics (hmm maybe not the right word) and not the implementation.

I agree.

nickbabcock commented 5 years ago

Let me see if I understand. Is (3) the uploading of coverage or re-running the tests to collect coverage data?

If the latter, the downside is we're doubling our testing time.

How does tarpaulin work? Does it wrap cargo test? Ideally this would all be some sort of middleware, but I'm not sure the best way to avoid re-running the test suite for code coverage and still receive test results and code coverage to upload.

epage commented 5 years ago

cargo-kcov wraps cargo test, parsing the textual output

https://github.com/kennytm/cargo-kcov/blob/master/src/main.rs#L177

cargo-tarpaulin wraps cargo test, calling directly into cargo crate

https://github.com/xd009642/tarpaulin/blob/master/src/lib.rs#L99

jonhoo commented 4 years ago

I think this should be closed given the recent direction of the project? Maybe with a reference in #78, though I'm not sure. #51 will remain open for how we handle testing of the templates themselves.

crate-ci / azure-pipelines

How should we handle testing #9