Need infrastructure to run tests in varying hardware intrinsic modes

danmoseley commented 5 years ago

Increasingly we are taking changes that use hardware intrinsics to accelerate parts of CoreFX. Without special care, our testing will only ever cover the AVX2 (or AVX) path, not the software path and not any SSE path.

We need a way to easily cause selected tests to re-run in each mode so every codepath is exercised.

In the ML.NET repo they repeat relevant tests under each of regular, no AVX, and no AVX/SSE using RemoteExec: https://github.com/dotnet/machinelearning/blob/d65af0f755b9cd5739693004f1a8a8964ce23d8a/test/Microsoft.ML.CpuMath.UnitTests.netcoreapp/UnitTests.cs#L140

We can do something similar, but centralize the magic environment variables and offer a flag on RemoteExec that tests can set to enumerate over them.

cc @tannergooding @jkotas @Anipik

ViktorHofer commented 5 years ago

Moving to future as this would be nice to have but not necessarily important for 3.0. That said I will prioritize it correctly and try to work on it as soon as other high prio items are done. I'll try to do at least a manual run asap to get some data.

gfoidl commented 4 years ago

It would be nice to write tests as

[Fact]
[HwIntrinsic(Intrinsic.Avx2 | Intrinsic.Ssse3)]
public void MySuperTest()
{
    // ...
}

and then the test-runtime / CI executes the tests with environment variables set (i.e. that the noted intrinsics are disabled) so that the code-path can be verified.

The approach with RemoteExec is possible, but it forces us to write some code that could belong to the infrastructure.

danmoseley commented 4 years ago

Both approaches above assume it's fairly well defined which tests we need to run this way. As more and more of our code uses intrinsics and particularly if we want some indirect test coverage we might instead want to execute the entire test suite periodically with the various environment variables.

Perhaps it could be added pairwise into our test matrix, eg., on Centos and Windows Server always go through software mode, etc - if we're confident it's an orthogonal parameter.

danmoseley commented 2 years ago

linking to discussion of what the knobs are/should be: https://github.com/dotnet/runtime/issues/11701

danmoseley commented 2 years ago

@stephentoub @jkotas now that we're being more thoughtful about our matrix, and more of it is post-commit, would it be reasonable to pairwise one or more of these flags into the mix? Eg., if we test Windows 10 x64 and Windows 11 x64 we could enable COMPlus_EnableHWIntrinsic=0 permanently on one of those. It seems to me vanishingly unlikely that that would mask any OS-specific bugs.

It's not entirely clear which switch/es to set though. COMPlus_EnableHWIntrinsic=0 will force the software path, which in some cases is (still?) exercised for free on Arm64 tests. It may be better to force the SSE2 path, eg.

cc @safern @tannergooding @BruceForstall

danmoseley commented 2 years ago

I just noticed your offer @tannergooding to drive thinking around this. Sounds good to me.

safern commented 2 years ago

I also can work with @tannergooding on getting this set up (whatever pieces we need to move).

BruceForstall commented 2 years ago

Note that we do exhaustive hardware intrinsic switch testing (to force various environments, in combination with JIT stress modes) of the CoreCLR Pri-1 tests for x86 (https://dev.azure.com/dnceng/public/_build/results?buildId=1399897&view=results) and arm64 (https://dev.azure.com/dnceng/public/_build/results?buildId=1399881&view=results).

Adding a few modes (for x86, x64, and arm64) for libraries tests makes sense.

tannergooding commented 2 years ago

we could enable COMPlus_EnableHWIntrinsic=0 permanently on one of those

@danmoseley, I don't think that one in particular is interesting and its somewhat covered already by x86 Linux and ARM32 (where hardware intrinsics aren't supported).

Namely, the most interesting setups are probably AVX=0 and SSE3=0 given that we require SSE2 support on x86/x64 and AdvSimd support on ARM64. However, this is also a general case that there's no real benefit to running these tests if the code hasn't changed and so running them as part of a weekly or monthly outerloop would likely catch any issues that might crop up.

Users adding new or touching existing intrinsic code can (and should) explicitly trigger the outerloop jobs against their PR.

jkotas commented 2 years ago

now that we're being more thoughtful about our matrix, and more of it is post-commit, would it be reasonable to pairwise one or more of these flags into the mix?

What is the total size of the matrix that would give us full coverage for these?

I am wondering whether we should set these randomly. It means that the failures won't be deterministic, but it is not that different from what we are doing to stress the product in general (stress failures are never deterministic).

jkotas commented 2 years ago

We can also use the same randomized approach for other variables with similar characteristics, like tiered compilation on/off.

tannergooding commented 2 years ago

What is the total size of the matrix that would give us full coverage for these?

The JIT currently tests with these modes for the outerloop: https://github.com/dotnet/runtime/blob/4957f6238cd0c42a33b6892974a79bbb2555ad8b/src/tests/Common/testenvironment.proj#L93-L113 There are additional variants that combine with the other JitStress modes as well, but those probably aren't as relevant for the library side tests.

In particular today, the library code has code paths based on:

AVX
AVX2
BMI1
BMI2
FMA
LZCNT
POPCNT
SSE
SSE2
SSSE3
SSE41

And for ARM:

AdvSIMD

I don't think we need to do anything like randomly set the switches. Simply having the same matrix as the JIT as outerloop jobs should be sufficient. PRs that are adding new intrinsic code paths can then explicitly trigger these jobs, just as we do when adding new JIT support via azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm.

The outerloop jobs should be plenty sufficient for catching any oddities that come about otherwise, particularly given that the code is rarely touched and any real issues should likely get caught by the JIT side tests first.

danmoseley commented 2 years ago

Linking this interesting example demonstrating that library authors need these flags for their own testing, and of course that means they expect us to run clean. https://github.com/dotnet/runtime/issues/60035#issuecomment-964492880

dotnet / runtime

Need infrastructure to run tests in varying hardware intrinsic modes #950