Open ankushbhardwxj opened 4 years ago
@ben-albrecht need label: gsoc: mason
A good thing to discuss would be the output of mason bench
. start_test --performance
outputs time
(average), memory
and status
. Do we require any other metric apart from these?
start_test --performance
tests have the ability for the user to time certain sections of their program and compute the metric that matters to them, such as FLOPS. It would be nice to be able to support this mode of testing.
@ronawho has been working with performance testing of some user codes (https://github.com/mhmerrill/arkouda), and may have some design input / requirements here.
Following is a proposed implementation of the benchmarking framework.
module BenchMarking {
class Bench {
// number of repititions - maximum 1e9
private int N;
// initialisation
var netDuration;
var netAllocs;
var netBytes;
// to reset timer and allocated memory heaps
proc resetTimer() {}
// to start a timer and initialise memory allocation
proc startTimer() {}
// stops timer and gathers net allocated memory and total time
proc stopTimer() {
// startAllocs, startBytes = initial state of totalAllocs and malloc
// netAllocs, netBytes = final state after running function
duration = timer.finish() - timer.start();
netAllocs += currentMallocs - startAllocs;
netByte += totalAllocs - startBytes;
}
// returns total time
proc totalTime() {}
// returns average time
proc avgTime() {}
// Detection of subbenchmarks ?
// if subbenchmark function is NOT present
proc run() {
Bench.resetTimer();
Bench.startTimer();
//benchfunction here
Bench.stopTimer();
var totalTime = Bench.totalTime();
var avgTime = totalTime / Bench.N;
}
// if subbenchmark is present
// run Benchamrk function once
proc runOnce() {}
// run subbenchmark N times
proc runNTimes() {}
proc main() {}
}
// various metrics to be displayed in result
class BenchMarkMetrics {
// metric - ns/op
proc NsPerOp() { return Bench.netDuration.nanoseconds() / Bench.N; }
// metric - mb/s
proc mbPerSec() {
var bytes = float64(Bench.netBytes);
var N = float64(Bench.N);
return (bytes * N / 1e6) / Bench.duration;
}
// metric - allocs/op
proc allocsPerOp() {
return Bench.netAllocs/Bench.N;
}
// metric - allocedBytes/op
proc allocedBytesPerOp() {
return Bench.netBytes/Bench.N;
}
}
}
I hope I have covered the basics for an initial version of the benchmarking framework. Please add new functions or modify existing by editing this comment. For more requirements, design discussions or improvements feel free to comment below.
@ankingcodes - Your outline will be helpful as we approach implementation. However, I think we should start at an even higher level: What will this interface look like to a user?
Let's start with a benchmark in Chapel's repo and imagine how it would be expressed with the mason bench
interface:
There's a lot of metadata to support here:
Correctness metadata:
stencil.good
: The expected output of the program
stencil.compopts
: The different compiler options to use for correctness testing
Mason.toml
, but not yet supported outside of mason packagesstencil.numlocales
: Number of locales to use for correctness testing
Performance metadata:
stencil.perfcompopts
: The different compiler options to use when performance testingstencil.perfkeys
: A few strings to match against for parsing the performance metric, also a validation string for correctness, since stencil.good
is not used for performance testingMultilocale performance metadata:
stencil.ml-numlocales
: Number of locales to use for multilocale performance testingstencil.ml-compopts
: Compiler flags to use when compiling for multilocale performancestencil.ml-keys
: performance keys for multilocale performanceHere is a reference for how the performance testing works for this test system.
For each piece of metadata, we need to decide how it will be expressed. The two main ways to express metadata in mason test
are:
UnitTest.numLocales(16)
Mason.toml
)stencil.compopts: The different compiler options to use for correctness testing
- This is currently supported through Mason.toml, but not yet supported outside of mason packages
@ben-albrecht I believe the reason of using Mason.toml is because of lack of support of --
syntax. Therefore, stencil.ml-compopts
and stencil.perf-compopts
should also be defined in Mason.toml
, and might be supported outside of Mason packages, if progress is made by #15695 .
For, .ml-keys
, .ml-numlocales
and .perfkeys
, I think we should be able to express them in the tests using functions. mason bench
can have a flag -ml
which would specify for multilocale tests.
I believe the role of .perfkeys
is identification of a performance test functions and correction testing similar to .good
. Therefore, we may use the function type suggested by Krishna at Slack to identify performance functions. eg :
private proc benchSignature(test : borrowed Bench) throws {}
I'm not sure about correction testing though. Could we reuse functions of UnitTest for that ?
Another feature that comes to mind is comparing with a previous performance test to get a declaration of improvement
/regression
. Could we log outputs of a performance test to a .dat
file or a .log
file which could be used to compare with current output of a specific test when called by a specific option to mason bench
?
I believe the reason of using Mason.toml is because of lack of support of -- syntax.
I am not sure what you mean by this.
Therefore, stencil.ml-compopts and stencil.perf-compopts should also be defined in Mason.toml, and might be supported outside of Mason packages, if progress is made by #15695 .
Agreed.
For, .ml-keys, .ml-numlocales and .perfkeys, I think we should be able to express them in the tests using functions.
I agree, but am not yet sure how perfkeys
should be captured. How should we get a value from within the test? Options include: recording it via a function call, returning it from the test, printing it and parsing it through stdout. I'm leaning toward recording via a function call.
mason bench can have a flag -ml which would specify for multilocale tests.
Something like that could work.
I'm not sure about correction testing though. Could we reuse functions of UnitTest for that ?
I think so. We could just re-use assertions from UnitTest.Test
. Maybe Bench
can inherit from Test
?
Could we log outputs of a performance test to a .dat file or a .log file which could be used to compare with current output of a specific test when called by a specific option to mason bench ?
👍 I think that'd be a great feature. csv
would make a good initial format.
I am not sure what you mean by this.
We cannot specify execopts
or compopts
using mason test -- --set order=100
Maybe Bench can inherit from Test?
Sounds good !
csv would make a good initial format.
Is there any specific reason for suggesting csv
? start_test
uses .dat
in the following format :
# Date Time: Memory:
03/26/18 194.3 24
04/02/18 194.3 24
I'll try to clear out #15695 before working on Bench. It will help me get more familiar with the working of test.
Is there any specific reason for suggesting csv ?
CSV is ubiquitous for storing 2-dimensional data.
Could you please elaborate on this ?
Maybe the metric to be logged can be saved via a function (method) call like:
proc benchTest(test : borrowed Bench) throws {
var rate = 0.0;
var avgTime = 0.0;
// Run the benchmark, assign rate and avgTime
test.record(rate, 'Rate (MFlops/s)');
test.record(avgTime, 'Avg time (s)');
}
I'm looking at this issue for the first time and for people in my situation it would really help to have a summary of the current direction for user-facing API in the issue description up top. I can see for example proc benchTest(test : borrowed Bench) throws
in the above comment and test.record()
being used but I can't quickly find information about what other functions exist and I don't want to have to read all of the discussion to date to offer an opinion on the current API design. (also note that I doubt the compiler will even allow you to create a method named record
since that is a keyword).
@ankingcodes - Let's try to "port" what some existing performance tests would look like under mason bench
framework - notably how the test metadata will be stored and how the test will be run.
I was originally going to point you to tests from Arkouda as an example, but it turns out they are quite complex due to launching a Chapel process and Python process which connects to it. That may be out of the scope of mason bench
(or at least initially), so let's focus on some existing performance tests in the Chapel repository. Here are a few examples I plucked out sorted by ascending metadata complexity:
.good
, .perfcompopts
, .perfkeys
.good
, .perfcompopts
, .perfkeys
, .execopts
.good
, .perfcompopts
, .perfkeys
, .perfexecopts
(ignore the .notest
).good
, .perfcompopts
, .perfkeys
, .numlocales
, .ml-numlocales
, .ml-keys
, .ml-compopts
I don't think we need full examples for all of these, but we should provide at least one full example and have snippets of code showing how we will represent the metadata for each one. This should give a clearer picture of what we are proposing and help raise any design questions that have not yet been considered.
@ben-albrecht @Spartee @krishnadey30 @ronawho @mppf - Some more work on the design has been done. We have a repository consisting of the above mentioned start_test --performance
example performance tests converted to the new design of mason bench
.
Link to repository : https://github.com/ankingcodes/mason-bench-design More details are available at README. Also, take a look on an issue by @ben-albrecht on that repository, for questions we are still discussing on. Please let us know about your suggestions and ideas on the new design.
Summary (updated)
Chapel requires a benchmarking tool inheriting features from
start_test
, which would be used with amason bench
command. Following the design ofmason test
and the Unit Test module, Mason packages would have abench/
directory which would hold all the benchmark programs. The benchmark programs can also be added to the manifest file (Mason.toml) as follows :Performance metadata and multilocale performance metadata will be expressed in a manifest (TOML) file, i.e.,
Mason.toml
file.toml
file with the name same as that of the program. i.eFoo.chpl
will have a manifest file namedFoo.toml
. This feature would be available after a PR on #15695In case of having multiple performance tests in a single file, the performance test functions would be identified using the function signature :
The output would be logged in a
.csv
file which would be used by a latter performance run to compare and declare an improvement/regression.A
mason bench
command when run inside a Mason package directory would look for performance tests in Mason.toml or thebench/
directory. If used outside of Mason package, the test should be given as argument, i.e,mason bench perfTest.chpl
, otherwise, mason bench would run only those programs which have a.toml
file with the same name. We plan to support multilocale performance testing using a-ml
flag.We have planned to keep
mason bench
similar tostart_test
so that an experienced Chapel user can easily transition tomason bench
. However, we intend to support graph generation in a later version ofmason bench
.Additionally, please check comments below for more details.
This issue is a part of GSoC proposal "Improvements to Mason Package Manager", and it aims to extend the UnitTest module by adding a benchmarking framework and to implement a
mason bench
command that would be used to run benchmarks.Initial discussions that have taken place in Slack with @ben-albrecht @Spartee and @krishnadey30 can be summarised as follows :
start_test --performance
so that transition fromstart_test
tomason bench
is easy for an user and loss of features would be minimal.mason bench
command would resemble tocargo bench
command. Also,mason bench
should be usable both inside and outside mason packages, similar tomason test
.mason bench
would consists of essential options available incargo bench
such as--no-fail-fast
,--offline
,--verbose
etc.mason bench
is called inside a mason package, then it should look for files inside abench
directory, else if it's called outside a mason package, it should look for files containing benchmarking functions by crawling through the directories.