This tool allows users to...
...for the individual users' data and access patterns.
Additionally, allows ion-java developers to...
...by generating results from a variety of real-world data and access patterns.
The tool uses the JMH microbenchmarking library and produces results in formats generated by JMH.
The following stats will be included in benchmark results:
Building the tool generates a self-contained executable jar
.
To build the tool with the latest version of the ion-java library, simply run
mvn clean install
from the directory containing pom.xml
. When the build completes, a jar
named
ion-java-benchmark-cli-<version>-jar-with-dependencies.jar
will be present in the target/
subdirectory.
This jar
may be copied to any location and may be executed using java -jar <path-to-jar>
.
Users may find it convenient to create an alias to java -jar <path-to-jar>
named, e.g.,
ion-java-benchmark
to simplify invocation of the tool.
Note: the Maven build will look for the latest ion-java version available in the local Maven
repository. Users may use the tool with a custom or in-development ion-java version by installing
the desired version into the local Maven repository. To determine which ion-java version the tool
is using, use the --version
command. After switching ion-java versions, the executable jar
must be rebuilt.
(Note: these examples assume the invocation of the tool has been aliased to ion-java-benchmark
.)
Benchmark a full-traversal read of example.10n
from file using the IonReader API, with 10 warmups,
10 iterations, and 1 fork, printing the results to stdout in JMH’s standard text format.
ion-java-benchmark read example.10n
Benchmark a fully-buffered write of binary Ion data equivalent to example.10n
to file using the
IonWriter API, with 10 warmups, 10 iterations, and 1 fork, printing the results to stdout in JMH’s
standard text format.
ion-java-benchmark write example.10n
Benchmark a write of binary Ion data equivalent to the first 1,000 top-level values in example.10n
to in-memory bytes using the IonWriter API, flushing after every 100 top-level values. Produce
results for both 0-byte length preallocation and 2-byte length preallocation to facilitate
comparison of both settings.
ion-java-benchmark write --io-type buffer \
--limit 1000 \
--ion-flush-period 100 \
--ion-length-preallocation 0 \
--ion-length-preallocation 2 \
example.10n
Profile a sparse read of example.10n
from file, materializing only the values that match the paths
specified in paths.ion, using ion-java-path-extraction.
This process will repetitively execute until manually terminated, allowing the user to attach a tool
for gathering performance profiles.
ion-java-benchmark read --profile --paths paths.ion example.10n
Benchmark a fully-buffered write of binary Ion data equivalent to example.10n both with and without using shared symbol tables. The file tables.ion contains a sequence of Ion symbol tables.
ion-java-benchmark write --ion-imports-for-benchmark tables.ion \
--ion-imports-for-benchmark none \
example.10n
Benchmark a full-traversal read of data equivalent to exampleWithImports.10n, which declares the shared symbol table imports provided by inputTables.ion, re-encoded (if necessary) using the shared symbol tables provided by benchmarkTables.ion, inputTables.ion, and no shared symbol tables. Produce results from using both the DOM and IonReader APIs.
ion-java-benchmark read --ion-imports-for-input inputTables.ion \
--ion-imports-for-benchmark benchmarkTables.ion \
--ion-imports-for-benchmark auto \
--ion-imports-for-benchmark none \
--ion-api dom \
--ion-api streaming \
exampleWithImports.10n
As the JMH output warns: "Do not assume the numbers tell you what you want them to tell." Benchmarking on the JVM is hard. There is non-deterministic behavior that can lead to high variance between iterations. Be suspicious of benchmark results with a reported Error that is a high percentage of the Score. Aim for an Error percentage of less than 10%.
To reduce Error, try increasing the number of warmup iterations, timed iterations, and forks. To ensure the JVM is properly warmed up, benchmarks should include enough warmup iterations to allow for the scores to stabilize. This often takes at least 20 seconds. Benchmarks should be run on idle systems. Background processes competing for resources can lead to higher variance, especially for benchmarks with a short execution time per invocation.
The default benchmark mode is SingleShotTime, meaning that the reported score is the result of a single invocation of the benchmark method. This works well for medium and large input data that takes on the order of seconds per invocation, but leads to higher variance for input data that takes only milliseconds or microseconds per invocation. Such data should be used with one of the other modes, each of which generates an iteration score by averaging the score of multiple invocations of the benchmark method. For very small data, it may also be necessary to change the reported time unit to provide enough granularity to observe differences between trials.
Both the read
and write
benchmark commands involve a setup phase that occurs before the benchmark
begins. However, due to a quirk in the JMH implementation, this phase occurs after JMH prints
Warmup Iteration 1:
to the output. This can make it seem like the first warmup iteration takes an
excessive amount of time or is deadlocked, but give it a chance to complete. Once the scores stabilize,
if the iterations take longer than you're willing to wait, consider using the --limit
option to
limit the amount of data processed by the benchmark. For write benchmarks, using --limit
may be
necessary depending on the size of the input data and the memory constraints of the system, as
the setup phase involves generating write instruction lambdas and storing them in memory.
Adding an option involves the following steps:
Main
, add the option to the USAGE
and OPTIONS
strings, mimicking the existing format.read
and write
commands, add parameterization logic to the
OptionsMatrixBase
constructor using the existing helper methods. If the option applies only to
the read
or the write
command, add this logic to the ReadOptionsMatrix
or WriteOptionsMatrix
constructor, respectively. These classes are responsible for generating the complete set of
combinations for the chosen set of option values, which corresponds to the complete set of benchmark
trials to be run.read
and write
commands, add parsing logic to the
OptionsCombinationBase
constructor using the existing helper methods. If the option applies only
to the read
or the write
command, add this logic to the ReadOptionsCombination
or
WriteOptionsCombination
constructor, respectively. These classes are responsible for representing
a single combination of option values, which corresponds to a single benchmark trial.OptionsCombinationBase
, ReadOptionsCombination
, or WriteOptionsCombination
and invoke this
method in any MeasurableTask
implementations to which the option applies. As an example, see
OptionsCombinationBase.newInputStream
. If the option applies to all formats but only the write
or read
command, then using the option within MeasurableReadTask
or MeasurableWriteTask
may
make sense. As an example, see MeasurableWriteTask.getTask
. If the option applies to multiple
formats, but not all formats, it may make sense to add a utility function or use inheritance. As
an example, see IonUtilities
, which is used by IonMeasurableReadTask
and IonMeasurableWriteTask
.
If the option is limited to use with a particular format and command, then its logic may belong
in the concrete MeasurableTask
implementation for that format/command combination. As an example,
the --ion-reader
option only applies to the read
command when used with either Ion text or binary,
so the logic that uses the option is contained within IonMeasurableReadTask
.OptionsTest
, add tests that exercise all values for this option (if enumerated) or a variety
of values, for all commands to which it applies.Commit d37b6fd demonstrates the steps required to add a new option.
Adding support for an additional serialization format involves the following steps.
pom.xml
.Format
enum to represent the new format.Main
's OPTIONS
string, edit the entry for the --format
command to allow for the new format.IonMeasurableReadTask
and IonMeasurableWriteTask
as examples, create concrete
implementations of MeasurableReadTask
and MeasurableWriteTask
for the new format.Format
enum value. This involves adding
logic to convert between formats and to instantiate the MeasurableTask
classes created in the
previous step. The convert
implementations for the existing Format
values will need to be updated
as well to support conversions from the new format. Add logic to Format.classify
to determine whether
a file contains data in the new format.Adding an option
section above to add any format- or library-specific
options to the CLI.OptionsTest
, add tests that thoroughly exercise the new format. Add data in the new format
into the test directory for use in tests.mvn clean install
. Using various samples of data in the new format, run
the tool by hand to make sure everything looks correct and the benchmark results look reasonable.EXAMPLES
string in Main
. Copy this
example into the Examples
section of this README.See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.