Mason Bench and benchmarking framework design

ankushbhardwxj commented 4 years ago

Summary (updated)

Chapel requires a benchmarking tool inheriting features from start_test, which would be used with a mason bench command. Following the design of mason test and the Unit Test module, Mason packages would have a bench/ directory which would hold all the benchmark programs. The benchmark programs can also be added to the manifest file (Mason.toml) as follows :

[bricks]
name="NewPackage"
version="0.1.0"
chplVersion="1.21.0"
benchmarks=["benchA.chpl", "benchB.chpl"]

Performance metadata and multilocale performance metadata will be expressed in a manifest (TOML) file, i.e.,

for within Mason packages, in a Mason.toml file
for outside of Mason packages, in a .toml file with the name same as that of the program. i.e Foo.chpl will have a manifest file named Foo.toml. This feature would be available after a PR on #15695

In case of having multiple performance tests in a single file, the performance test functions would be identified using the function signature :

private proc benchSignature(test : borrowed Bench) throws {}

The output would be logged in a .csv file which would be used by a latter performance run to compare and declare an improvement/regression.

A mason bench command when run inside a Mason package directory would look for performance tests in Mason.toml or the bench/ directory. If used outside of Mason package, the test should be given as argument, i.e, mason bench perfTest.chpl, otherwise, mason bench would run only those programs which have a .toml file with the same name. We plan to support multilocale performance testing using a -ml flag.

We have planned to keep mason bench similar to start_test so that an experienced Chapel user can easily transition to mason bench. However, we intend to support graph generation in a later version of mason bench.

Additionally, please check comments below for more details.

This issue is a part of GSoC proposal "Improvements to Mason Package Manager", and it aims to extend the UnitTest module by adding a benchmarking framework and to implement a mason bench command that would be used to run benchmarks.

Initial discussions that have taken place in Slack with @ben-albrecht @Spartee and @krishnadey30 can be summarised as follows :

Ideas for the benchmarking framework is being taken from Go's benchmarking framework. However, it should reflect features available in start_test --performance so that transition from start_test to mason bench is easy for an user and loss of features would be minimal.
Since Mason is heavily inspired by Cargo, the mason bench command would resemble to cargo bench command. Also, mason bench should be usable both inside and outside mason packages, similar to mason test.
Options for the mason bench would consists of essential options available in cargo bench such as --no-fail-fast, --offline, --verbose etc.
If mason bench is called inside a mason package, then it should look for files inside a bench directory, else if it's called outside a mason package, it should look for files containing benchmarking functions by crawling through the directories.
There can be 2 modes to run benchmarks :-
- running a compiled program N times and report the average time to execute that program
- running a function within a program N times and report the time of only that function.

ankushbhardwxj commented 4 years ago

@ben-albrecht need label: gsoc: mason

ankushbhardwxj commented 4 years ago

A good thing to discuss would be the output of mason bench. start_test --performance outputs time (average), memory and status. Do we require any other metric apart from these?

ben-albrecht commented 4 years ago

start_test --performance tests have the ability for the user to time certain sections of their program and compute the metric that matters to them, such as FLOPS. It would be nice to be able to support this mode of testing.

@ronawho has been working with performance testing of some user codes (https://github.com/mhmerrill/arkouda), and may have some design input / requirements here.

ankushbhardwxj commented 4 years ago

Implementation proposal:

Following is a proposed implementation of the benchmarking framework.

module BenchMarking {
   class Bench {
       // number of repititions - maximum 1e9
       private int N;
       // initialisation
       var netDuration;
       var netAllocs;
       var netBytes;
       // to reset timer and allocated memory heaps
       proc resetTimer() {}
       // to start a timer and initialise memory allocation
       proc startTimer() {}
       // stops timer and gathers net allocated memory and total time
       proc stopTimer() {
         // startAllocs, startBytes = initial state of totalAllocs and malloc
         // netAllocs, netBytes = final state after running function
         duration = timer.finish() - timer.start();
         netAllocs += currentMallocs - startAllocs;
         netByte += totalAllocs - startBytes;
       }
       // returns total time
       proc totalTime() {}
       // returns average time
       proc avgTime() {}
       // Detection of subbenchmarks ?
       // if subbenchmark function is NOT present
       proc run() {
         Bench.resetTimer();
         Bench.startTimer();
         //benchfunction here
         Bench.stopTimer();
         var totalTime = Bench.totalTime();
         var avgTime = totalTime / Bench.N;
       }
      // if subbenchmark is present
      // run Benchamrk function once
      proc runOnce() {}
      // run subbenchmark N times
      proc runNTimes() {}

      proc main() {}
   }

    // various metrics to be displayed in result
   class BenchMarkMetrics {
     // metric - ns/op
     proc NsPerOp() { return Bench.netDuration.nanoseconds() / Bench.N; }
     // metric - mb/s
     proc mbPerSec() { 
       var bytes = float64(Bench.netBytes);
       var N = float64(Bench.N);
       return (bytes * N / 1e6) / Bench.duration;
     }
     // metric - allocs/op
     proc allocsPerOp() {
       return Bench.netAllocs/Bench.N;
     }
     // metric - allocedBytes/op
     proc allocedBytesPerOp() {
       return Bench.netBytes/Bench.N;
     }
   }
}

I hope I have covered the basics for an initial version of the benchmarking framework. Please add new functions or modify existing by editing this comment. For more requirements, design discussions or improvements feel free to comment below.

ben-albrecht commented 4 years ago

@ankingcodes - Your outline will be helpful as we approach implementation. However, I think we should start at an even higher level: What will this interface look like to a user?

Let's start with a benchmark in Chapel's repo and imagine how it would be expressed with the mason bench interface:

stencil.chpl

There's a lot of metadata to support here:

Correctness metadata:

stencil.good: The expected output of the program
- This may not be necessary since we can use UnitTest assertions for validation
stencil.compopts: The different compiler options to use for correctness testing
- This is currently supported through Mason.toml, but not yet supported outside of mason packages
stencil.numlocales: Number of locales to use for correctness testing
- UnitTest has a way to express this

Performance metadata:

stencil.perfcompopts: The different compiler options to use when performance testing
stencil.perfkeys: A few strings to match against for parsing the performance metric, also a validation string for correctness, since stencil.good is not used for performance testing

Multilocale performance metadata:

stencil.ml-numlocales: Number of locales to use for multilocale performance testing
stencil.ml-compopts: Compiler flags to use when compiling for multilocale performance
stencil.ml-keys: performance keys for multilocale performance

Here is a reference for how the performance testing works for this test system.

For each piece of metadata, we need to decide how it will be expressed. The two main ways to express metadata in mason test are:

In the test itself, using functions like UnitTest.numLocales(16)
In the manifest file for a mason package (Mason.toml)

ankushbhardwxj commented 4 years ago

stencil.compopts: The different compiler options to use for correctness testing

This is currently supported through Mason.toml, but not yet supported outside of mason packages

@ben-albrecht I believe the reason of using Mason.toml is because of lack of support of -- syntax. Therefore, stencil.ml-compopts and stencil.perf-compopts should also be defined in Mason.toml, and might be supported outside of Mason packages, if progress is made by #15695 . For, .ml-keys, .ml-numlocales and .perfkeys, I think we should be able to express them in the tests using functions. mason bench can have a flag -ml which would specify for multilocale tests.

ankushbhardwxj commented 4 years ago

I believe the role of .perfkeys is identification of a performance test functions and correction testing similar to .good. Therefore, we may use the function type suggested by Krishna at Slack to identify performance functions. eg :

private proc benchSignature(test : borrowed Bench) throws {}

I'm not sure about correction testing though. Could we reuse functions of UnitTest for that ?

ankushbhardwxj commented 4 years ago

Another feature that comes to mind is comparing with a previous performance test to get a declaration of improvement/regression. Could we log outputs of a performance test to a .dat file or a .log file which could be used to compare with current output of a specific test when called by a specific option to mason bench ?

ben-albrecht commented 4 years ago

I believe the reason of using Mason.toml is because of lack of support of -- syntax.

I am not sure what you mean by this.

Therefore, stencil.ml-compopts and stencil.perf-compopts should also be defined in Mason.toml, and might be supported outside of Mason packages, if progress is made by #15695 .

Agreed.

For, .ml-keys, .ml-numlocales and .perfkeys, I think we should be able to express them in the tests using functions.

I agree, but am not yet sure how perfkeys should be captured. How should we get a value from within the test? Options include: recording it via a function call, returning it from the test, printing it and parsing it through stdout. I'm leaning toward recording via a function call.

mason bench can have a flag -ml which would specify for multilocale tests.

Something like that could work.

I'm not sure about correction testing though. Could we reuse functions of UnitTest for that ?

I think so. We could just re-use assertions from UnitTest.Test. Maybe Bench can inherit from Test?

Could we log outputs of a performance test to a .dat file or a .log file which could be used to compare with current output of a specific test when called by a specific option to mason bench ?

👍 I think that'd be a great feature. csv would make a good initial format.

ankushbhardwxj commented 4 years ago

I am not sure what you mean by this.

We cannot specify execopts or compopts using mason test -- --set order=100

Maybe Bench can inherit from Test?

Sounds good !

csv would make a good initial format.

Is there any specific reason for suggesting csv ? start_test uses .dat in the following format :

# Date     Time:   Memory:
03/26/18   194.3   24
04/02/18   194.3   24

ankushbhardwxj commented 4 years ago

I'll try to clear out #15695 before working on Bench. It will help me get more familiar with the working of test.

ben-albrecht commented 4 years ago

Is there any specific reason for suggesting csv ?

CSV is ubiquitous for storing 2-dimensional data.

Could you please elaborate on this ?

Maybe the metric to be logged can be saved via a function (method) call like:

proc benchTest(test : borrowed Bench) throws {

  var rate = 0.0;
  var avgTime = 0.0;

  // Run the benchmark, assign rate and avgTime

  test.record(rate, 'Rate (MFlops/s)');
  test.record(avgTime, 'Avg time (s)');

}

mppf commented 4 years ago

I'm looking at this issue for the first time and for people in my situation it would really help to have a summary of the current direction for user-facing API in the issue description up top. I can see for example proc benchTest(test : borrowed Bench) throws in the above comment and test.record() being used but I can't quickly find information about what other functions exist and I don't want to have to read all of the discussion to date to offer an opinion on the current API design. (also note that I doubt the compiler will even allow you to create a method named record since that is a keyword).

ben-albrecht commented 4 years ago

@ankingcodes - Let's try to "port" what some existing performance tests would look like under mason bench framework - notably how the test metadata will be stored and how the test will be run.

I was originally going to point you to tests from Arkouda as an example, but it turns out they are quite complex due to launching a Chapel process and Python process which connects to it. That may be out of the scope of mason bench (or at least initially), so let's focus on some existing performance tests in the Chapel repository. Here are a few examples I plucked out sorted by ascending metadata complexity:

arrayAdd - A simple performance test
- metadata includes: .good, .perfcompopts, .perfkeys
- This is a minimal test beyond the compopts
parOpEquals - Another simple perf tests
- metadata includes: .good, .perfcompopts, .perfkeys, .execopts
- This test adds execopts to the mix
transpose-perf
- metadata includes: .good, .perfcompopts, .perfkeys, .perfexecopts (ignore the .notest)
- This test depends on the third-party software BLAS, something mason should be made aware of
stencil
- metadata includes: .good, .perfcompopts, .perfkeys, .numlocales , .ml-numlocales, .ml-keys, .ml-compopts
- This test can be run as a single or multi-locale performance test

I don't think we need full examples for all of these, but we should provide at least one full example and have snippets of code showing how we will represent the metadata for each one. This should give a clearer picture of what we are proposing and help raise any design questions that have not yet been considered.

ankushbhardwxj commented 4 years ago

@ben-albrecht @Spartee @krishnadey30 @ronawho @mppf - Some more work on the design has been done. We have a repository consisting of the above mentioned start_test --performance example performance tests converted to the new design of mason bench.

Link to repository : https://github.com/ankingcodes/mason-bench-design More details are available at README. Also, take a look on an issue by @ben-albrecht on that repository, for questions we are still discussing on. Please let us know about your suggestions and ideas on the new design.

chapel-lang / chapel