The jaeger_stats is a library project focussed on handling and analyzing jaeger-traces. Jaeger-traces provide very detailled information. This is very useful for a detailled issue analysis. Hoevever this can also be a very useful source of information on how processes run in a complex microservices landscape and to gain insights how the landscape and the pressure on the individual service evolve over time.
This Jaeger_stats also contains a few tools (executables) build on top of the library to show-case how the tooling can be used, or even to use the tooling.
You can run the tool on a single Jaeger-trace via the command:
trace_analysis <data_folder>
Here data_folder can be an absolute or a relative path, however the expansion of '~' to a home-folder is not supported. The path-encoding needs to match the conventions of your system (Windows or Linux/Unix/Mac).
The tool will analyse all read all json-file in the folder (assuming these are valid Jaeger-trace files) and will process these files and compute statistics. Each json file can contains one or more traces. Output will be generated in the next folders:
Traces will be deduplicated before analysis based on the 'trace_id' so if the folder contains files that overlap in traces they contain this overlap is removed.
When you run the trace_analysis with flag --help
you see:
$ trace_analysis --help
Parsing and analyzing Jaeger traces
Usage: trace_analysis [OPTIONS] <INPUT>
Arguments:
<INPUT>
Options:
--caching-process <CACHING_PROCESS>
-c, --call-chain-folder <CALL_CHAIN_FOLDER>
The default source for call-chain information is a sub-folder'CallChain' located in the current folder [default: CallChain/]
-z, --timezone-minutes <TIMEZONE_MINUTES>
[default: 120]
-f, --comma-float
-t, --trace-output
-o, --output-ext <OUTPUT_EXT>
The output-extension determines the output-types are 'json' and 'bincode' (which is also used as the file-extension) [default: json]
-h, --help
Print help
-V, --version
Print version
The options are:
The statistics files, such as 'Stats/cummulative_trace_stats.csv' use the ';' as the column separator. This file falls apart in four sections:
Jaeger tracing spans are send over UDP, which is a protocol that does not give strong delivery guarantees. So occasionally a span might be lost which results in an incomplete trace, and thus broken call-chains in the trace. This is where the weird '-c' option pops up as seen in the previous example: trace_analysis <data_folder> -c <data_folder>/CallChain
. Here the CallChain produced by the first run of the tool (only showing complete chains) will be used in the subsequent runs of the tool to correct incomplete call-chains for missing spans. However, the preferred option is to set up a separate folder to contain the call-chains, refer the '--call-chain-folder' or '-c' to this folder.
The call-chain corrections are only applied:
Path parameters might wreak havoc on our analysis as path parameters make each URL unique while we are looking for averages over a number of invocations Therefore the system does correction on the URL's to extract the parameters, for example an order number and replaces that with a symbolic value '{ORDER}'. However, these replacements are currently hardcoded and we need to take some steps to make this configurable.
If data is provided in a large batches it is possible to compute the rate from the data. However, we do not want to assume that all files with traces fall in the same time-period. Therefore we compute frequencies by computing times between subsequent calls and dropping the num_files largest intervals, as these might corresponds to gaps inbetween files. Based on this time the rate is computed as a frequency by the formula f=1/T where T is the duration in seconds between subsequent calls.
In the Jaeger web-based front end it is possible to make a selection of traces. After these traces have been returned you have two methods to extract the JSON files:
Method 2.2 allows you to select 1000 traces or more. However, the output a single line of raw json (not-pretty-printed) and the file is encoded in UTF-16-LE with BOM. The 'trace_analysis' can handle these files and will do an in-memory conversion to UTF8 before processing. Beware that this is a non-streaming conversion so the full file is in memory twice.
The stitch tool is used to take a series of trace_analysis outputs and stitch them together to a single time-series analysis. The inputs are defined in a file 'input.stitch'.
The collected (time-series) output is written to a file 'stitch.csv' (default) which can easily read into Microsof Excel. The output contains (fine-grained) metrics-data as a time-series for all:
Next to the detailled output a file is generated that shows the anomalies (outliers) that have been detected.
When you run the 'stitch' with flag --help
you see:
$ stitch -h`
Stitching results of different runs of trace_analysis into a single CSV for visualization in Excel
Usage: stitch [OPTIONS]
Options:
-s, --stitch-list <STITCH_LIST> [default: input.stitch]
-o, --output <OUTPUT> [default: stitched.csv]
-a, --anomalies <ANOMALIES> [default: anomalies.csv]
-c, --comma-float
-d, --drop-count <DROP_COUNT> [default: 0]
--scaled-slope-bound <SCALED_SLOPE_BOUND> [default: 0.05]
--st-num-points <ST_NUM_POINTS> [default: 5]
--scaled-st-slope-bound <SCALED_ST_SLOPE_BOUND> [default: 0.05]
--l1-dev-bound <L1_DEV_BOUND> [default: 2]
-h, --help Print help
-V, --version Print version
The options are:
An example of an input-file ('input.stitch') is:
# comment line: this line is full ignored
/home/ceesvk/jaeger/batch/Stats/cummulative_trace_stats.json # an absolute path
../../jaeger/get_order/Stats/cummulative_trace_stats.json # a relative path
% ../../jaeger/post_order/Stats/cummulative_trace_stats.json # This line is showing up as an empty column due to the % in front
# yet another comment (empty line above is ignored)
Beware that ALL files in the 'input.stitch' should exist and should be valid input files, otherwise the 'stitch' program will terminate with no output.
When extracting datasets via Curl or other tools the Jaeger system returns up to 1000 traces in a single file. This file is in UTF-16-LE encoding instead of UTF-8 and is a JSON-file in a compact (minimized) format. Thus it is difficult to read these files, or to extract data out of them. For this purpose we proved the show_traces tool. It reads all jaeger-traces in a folder and then outputs these traces in a single file per trace in the folder 'Jaeger'. If are only interested in a few specific files you can provide the trace-ids of these files as a comma-separate list.
When you run the show_traces with flag --help
you see:
Show the Jaeger-traces, or a selection of jaeger-traces, as Pretty-printed JSON in UTF-8 format
Usage: show_traces [OPTIONS] <INPUT>
Arguments:
<INPUT>
Options:
-t, --trace-ids <TRACE_IDS> The default sources is the current folder [default: ]
-z, --timezone-minutes <TIMEZONE_MINUTES> [default: 120]
-h, --help Print help
-V, --version Print version
the Jaeger_stats tooling is deployed to pypi.org as a Python project via an automated Github CI/CD pipeline. Thus the tools can be installed easily on Windows, Mac and Linux via the next command:
pip install jaeger_stats
If you need pre-releases of the tool you need to use:
pip install --pre --force-reinstall jaeger_stats
The tool is include in the examples folder and can be build via the command:
cargo build trace_analysis
The 'trace_analysis' executable can be found in 'target/debug/examples/trace_analysis'.
In case you need to process a large volume of traces you might aim for the more performant 'release' build (which also drops some run-time checks). To build a release version use:
cargo build --release trace_analysis
The 'trace_analysis' executable can be found in 'target/release/examples/trace_analysis'.
You can also install the tool via
cargo install --release trace_analysis
On linux this will deploy a release version of 'trace_analysis' in the folder '$HOME/.cargo/bin/' which is assumed to be included in your path.
This project is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), same as the Rust language.