Open-CAS / standalone-linux-io-tracer

Standalone Linux IO Tracer (iotrace) is a tool for block device and filesystem I/O tracing
BSD 3-Clause "New" or "Revised" License
58 stars 22 forks source link
blockdevice filesystem io linux linux-kernel telemetry tracing

Standalone Linux IO Tracer

NEWS

Standalone Linux IO Tracer switched to using eBPF for capturing traces. Previously, the tracer ran custom loadable kernel module for that. eBPF tracing method is more secure for user. The version which uses the kernel module will be obsoleted. In case you want to run old version, switch to this branch: master-kernel

Description

Standalone Linux IO Tracer (iotrace) is a tool for block device I/O tracing and management of created traces

For each I/O to target device(s) basic metadata information is captured (IO operation type, address, size), supplemented with extended classification. Extended classification contains information about I/O type (direct / filesystem metadata / file) and target file attributes(e.g. file size).

iotrace is based on Open CAS Telemetry Framework (OCTF). Collected traces are stored in OCTF trace location. Traces can later be converted to JSON or CSV format.

The iotrace executable (iotrace command line application) includes an eBPF program which is loaded to the Linux kernel during tracing. The eBPF program captures trace information and shares them to the userspace iotrace application. This is serialized to the OCTF IO trace.

In this readme:

Supported OS

Right now the compilation of Standalone Linux IO Tracer is tested on the following OSes:

OS Version Kernel Version
Fedora 36 5.18.10
Ubuntu 22.04 5.15.0

Source Code

Source code is available in the official Standalone Linux IO Tracer GitHub repository:

git clone https://github.com/open-cas/standalone-linux-io-tracer
cd standalone-linux-io-tracer

Deployment

Prerequisites

Build

To build the iotrace executable invoke:

make

You can try to create rpm/deb installation package.

make package

For example in case of Fedora OS, the generated installation package is located in build/release/iotrace-XX.YY.ZZ-1.x86_64.rpm.

Installation

To install iotrace call:

sudo make install

Also you can try to install iotrace using rpm/deb package:

rpm -Uvh iotrace-XX.YY.ZZ-1.x86_64.rpm.

Theory of operation

Standalone Linux IO Tracer captures request data by registering to multiple trace points surfaced by the Linux kernel (e.g. BIO queueing, BIO splitting, BIO completion). This allows for gathering of IO metadata at the request level and passing it between kernel and userspace.

A perf buffer is allocated and shared between the eBPF program and the userspace application. The below example shows a recorded traces event.

struct iotrace_event_hdr {
    /** Event sequence ID */
    log_sid_t sid;

    /** Time stamp */
    uint64_t timestamp;

    /** Trace event type, iotrace_event_type enunerator */
    uint32_t type;

    /** Size of this event, including header */
    uint32_t size;
} __attribute__((packed, aligned(8)));

...

struct iotrace_event {
    /** Trace event header */
    struct iotrace_event_hdr hdr;
    /**
     * @brief IO ID
     *
     * This ID can be used by the tracing environment to assign an ID to the IO.
     *
     * @note Zero means not set.
     */
    uint64_t id;

    /** Address of IO in sectors */
    uint64_t lba;

    /** Size of IO in sectors */
    uint32_t len;

    /** IO class of IO */
    uint32_t io_class;

    /** Device ID */
    uint32_t dev_id;

    /** Operation flags: flush, fua, ... .
     * Values according to iotrace_event_flag_t enum
     * are summed (OR-ed) together. */
    uint32_t flags;

    /** Operation type: read, write, discard
     * (iotrace_event_operation_t enumerator) **/
    uint8_t operation;

    /** Write hint associated with IO */
    uint8_t write_hint;
} __attribute__((packed, aligned(8)));

The events declaration file can be found here.

The userspace part of the Standalone Linux IO Tracer reads the entries from the perf buffer and translates them into Google Protocol Buffer format (see example below), for easier portability. The data is then serialized in trace files in a per CPU basis (e.g. octf.trace.0).

message EventHeader {
    /** Event sequence ID */
    uint64 sid = 1;

    /** Time stamp */
    uint64 timestamp = 2;
}

...

enum IoType {
    UnknownIoType = 0;
    Read = 1;
    Write = 2;
    Discard = 3;
}

...

message EventIo {
    /** Address of IO in sectors */
    uint64 lba = 1;

    /** Size of IO in sectors */
    uint32 len = 2;

    /** IO class of IO */
    uint32 ioClass = 3;

    /** Device ID */
    uint64 deviceId = 4;

    /** Operation type: read, write, trim */
    IoType operation = 5;

    /** Flush flag */
    bool flush = 6;

    /** FUA flag */
    bool fua = 7;

    /** Write (lifetime) hint */
    uint32 writeHint = 8;

    /**
     * This ID can be used by the tracing environment to assign an ID to the IO.
     * Zero means not set.
     */
    uint64 id = 9;
}

...

message Event {
    /** Trace event header */
    EventHeader header = 1;

    oneof EventType {
        EventIo io = 2;
        EventDeviceDescription deviceDescription = 3;
        EventIoFilesystemMeta filesystemMeta = 4;
        EventIoCompletion ioCompletion = 5;
        EventIoFilesystemFileName filesystemFileName = 6;
        EventIoFilesystemFileEvent filesystemFileEvent = 7;
    }
}

The protobuf events declaration file can be found here.

You may see the results of translating into the above protobuf format, by executing the following command:

iotrace --trace-parser --io --path "kernel/2020-07-02_08:52:51" --raw

Output:

...
{"header":{"sid":"1","timestamp":"14193058940837"},"deviceDescription":{"id":"271581186","name":"nvme0n1","size":"732585168","model":"INTEL SSDPED1K375GA"}}
{"header":{"sid":"73","timestamp":"14196894550578"},"io":{"lba":"1652296","len":256,"ioClass":19,"deviceId":"271581186","operation":"Write","flush":false,"fua":false,"writeHint":0,"id":"110842991263647"}}
{"header":{"sid":"74","timestamp":"14196894550696"},"filesystemMeta":{"refSid":"110842991263647","fileId":{"partitionId":"271581186","id":"76","creationDate":"2020-07-02T06:52:55.712990641Z"},"fileOffset":"0","fileSize":"241960"}}
...

After tracing is complete these singular trace events may be parsed, combined and translated into different Google Protocol Buffer messages (or other formats, such as CSV) when executing Standalone Linux IO Tracer trace parser commands.

For example the --trace-parser --io command analyzes multiple submission, split and completion events to give a more complete view of a given IO request such as: its latency, queue depth, file size and path (if applicable) etc.

Examples

Tests

See our tests README

Contributing

Please refer to the OCTF contributing guide.

Related Projects

Please explore related projects:

Notice

NOTICE contains more information