Difficult for binary instrumentation experiments

I'm trying to test an optimization tool that repeatedly rewrites a function in a native binary and then tests the rewrite for correctness and performance. The structure of the nnpack binaries makes this process extremely complicated and error prone:

There is no easy way to determine which kernel will be invoked by a particular set of command-line arguments. For example, if I want to test nnp_fft8x8_with_offset_and_stream__avx2, I have to step through the code and figure out which kind of inputs will be directed to that kernel.
There are some preparatory loops and other setup phases that are not isolated from the timed code, so my tool requires special filters to ignore certain kernel invocations.
The validation option is not available in the xxxx-benchmark binaries. So for example, suppose I get a great speedup in one of my variations of the kernel--now I am facing a mountain of work to validate that my tool did not accidentally break the function. My best solution has been to splice the rewritten function into one of the other binaries that has a validator, but this is very time consuming because it requires patching rip-relative addresses (and there are more complications if my rewrite is larger than the original function).

At this point it has simply become too difficult to experiment with my tool on nnpack. To continue, what I will need is a single binary that can do the following:

Run any kernel in a meaningful context. It doesn't matter to me what data or what kind of use case it chooses to run (even if someone explained it to me, I wouldn't understand). I just want it to do whatever it does: black box, no manual.
Provide a single-knob command-line argument to adjust the duration of the test. So if the default configuration runs for 10 seconds, then passing an argument of 2 would make it run for 20 seconds, or .5 for 5 seconds.
Automatically evaluate every invocation for correctness. It doesn't matter how, and if there are errors, it doesn't matter what. The domain of nnpack is totally opaque to me, so I will simply step through the validator and find what it did not approve in binary executable terms.

I would be glad to implement this "test driver" and submit a pull request, but it's a bit over my head at this point. If someone can walk me through the steps in extremely concrete, low-level terms, I can probably figure out the details.

Maratyszcza / NNPACK

Difficult for binary instrumentation experiments #141