Dissect

Dissect splits a string into its parts. A dissect implementation compares a string against a pattern and then splits the string based on the pattern rules. This specification defines the expected behavior for dissect implementations.

Terms:

pattern (or tokenizer) - the pattern used to define how to split a string. For example "%{timestamp} %{+timestamp} %{+timestamp} %{logsource} %{program}[%{pid}]: %{message}.
key - the part of the string to match, identified by %{key}.
delimiter - the part of the string to NOT match.
modifier - An special instruction found inside the dissect key to change the behavior.
parser - the software that implements this specification to split the string.

Basic example:

pattern: %{a} %{b},%{c}
string: foo bar,baz
result: a=foo, b=bar, c=baz

This pattern has 3 keys: a, b, and c , and two delimiters ` (space) and,` (comma). When the parser is run against the string with the given pattern, the result is a set of key/value pairs. The parser searches for the when the delimiter in the pattern matches the delimiter in the string.

In this example, search the string for ` (space), the first delimiter. Found a space in the string, so assign the a key equal to everything up until the space (but not including). So a=foo. The next delimiter is,` (comma) search for comma in the string, found it, so assign b=bar. No more delimiters, so assign c to the remainder of the string (baz).

Specification

Pattern Specification
1. A dissect pattern must contain at least one key
2. A dissect pattern may have leading and trailing and delimiters
3. A dissect pattern may have multiple delimiters of different characters of different lengths.
4. A dissect patten must contain unique key names unless the modifier allows or requires duplicated key names.
5. A dissect pattern may not use % as delimiters
Key specification
1. A dissect key must start with %{ and end with }
2. A dissect key may have a name, e.g. %{key_name} and it must be able to be encoded as UTF-8.
3. A dissect key may have an empty name e.g. %{}, this is called a skip key and must not be included in the final results.
4. A dissect key name may not have any of the modifiers characters as part of the name.
5. A dissect key may have modifiers to the left or the right, or left and right of the key name.
Modifier specification
1. A dissect modifier must be defined inside the dissect key, to the left or right the key name.
2. Multiple dissect modifiers per key may be allowed.
3. ->: Right padding ignore - instructs the parser to ignore repeating consecutive repeating delimiters to the right of the key. The -> modifier must be placed to the right of the key name and is allowed to co-exist with any other modifiers and must always be the furthest right modifier. see example below
4. + Append - instructs the parser to append this key's value to the value to the prior key (left to right) with the same name. A user defined append separator must be supported. The user defined separator is a character, or set of characters that will be placed between the appended values. The + modifier must be placed to the left of the key name. see example below
5. + and /n Append with order - instructs the parser to append this key's to the value of the prior key with the same name based on order. The + modifier must be placed on the left of the key name and /n modifier placed to the right of the key name, where n = order. The order must start at 1. see example below
6. ? - Named skip key instructs the parser to not include this result in the final result set. Behaves identical to an empty skip key %{} but may be used to help with human readability. The ? modifier must be placed to the left of the key name. see example below
7. * and & reference modifiers. This modifier requires two keys with the same name present in the dissect pattern. One key with the * and another with the &. This instructs the parser that the value discovered by the * is to be used as the key name for the value discovered by the corresponding & key. These modifiers must be placed on the left of the key name. see example below
Parser specification
1. A dissect parser must not allow partial matches. All delimiters must be present in string, and all keys must have a corresponding value.
2. A dissect parser must support an empty key %{} (skip key) as valid match, but not include the result in the result set.
3. A dissect parser must be able to parse any string that can be encoded as UTF-8
4. A dissect parser must match the leading and trailing delimiters if present in the dissect pattern.
5. A dissect parser must allow the last key of a pattern to match the remainder of the string without additional modifiers. see example below
6. A dissect parser must treat consecutive repeating delimiters as valid empty matches unless instructed otherwise by modifiers. see example below
7. A dissect parser must allow a user specified string to use as the value between append operations. see example below
8. A dissect parser must support multiple character delimiters.
9. A dissect parser result set must be string/string key value pairs.
10. A dissect parser must support all modifiers defined by they specification.

Examples:

Right padding modifier `->`

pattern: %{a->} %{b} %{c}
string: foo bar baz
result: a=foo, b=bar, c=baz

In the above example, the delimiter is ` (space), the->instructs the parser to skip all of the consecutive repeating to the right ofa`

pattern: %{a->},%{b},%{c}
string: foo,,,,bar,baz
result: a=foo, b=bar, c=baz

In the above example, the delimiter is , (comma) and the -> instructs the parser to skip all of the consecutive repeating , to the right of a

Multi-character delimiters must be supported.

pattern: %{a->},:%{b},%{c}
string: foo,:,:,:,:bar,baz
result: a=foo, b=bar, c=baz

Empty skip key with right padding must be supported.

pattern: %{->},%{b},%{c}
string: foo,,,,bar,baz
result: b=bar, c=baz

Append modifier `+`

pattern: %{a} %{+a} %{+a}
string: foo bar baz
result: a=foobarbaz

In the above example the, the values are append in left to right order to the result.

A user specified append separator must be supported. Assume the user define the separator to be , (comma space)

pattern: %{a} %{+a} %{+a}
string: foo bar baz
result: a=foo, bar, baz

Append modifier with order `+` with `/n`

pattern: %{a} %{+a/2} %{+a/1}
string: foo bar baz
result: a=foobazbar

In the above example the values are appended together based on the order specified.

Named skip key `?`

pattern:%{a} %{?skipme} %{c}
string: foo bar baz
result: a=foo, c=baz

In the above example, the parser finds the matches correctly, but excludes the middle key from the results. This is the same behavior as %{}, and the name is only used for human readability.

Reference keys `*` and `&`

pattern: %{*a} %{b} %{&a}
string: foo bar baz
result: foo=baz, b=bar

In the above example, there is a pair of a keys. One has the * and the other &. This instructs the parser to use the value of the * as the key name for the value of & in the result set. * and & must come in pairs in the dissect pattern.

The left / right order of * and &does not matter.

pattern: %{&a} %{b} %{*a}
string: foo bar baz
result: baz=foo, b=bar

Remaining match

pattern: %{a} %{b},%{c}
string: foo bar,baz something more here
result: a=foo, b=bar, c=baz something more here

In the above example the last key matched the remainder of the input string.

Consecutive repeating delimiters

pattern: %{a},%{b},%{c},%{d},%{e},%{f},%{g}
string: foo,,,,,,bar
result: a=foo, b="", c ="", d="", e="", f="", g=bar

In the above example the , repeats many times, leaving 5 empty key/value pairs.

pattern: %{a},%{b},%{c},%{d},%{e},%{f},%{g}
string: foo,,bar,,,,baz
result: a=foo, b="", c ="bar", d="", e="", f="", g=baz

In the above example the , repeats many times, finds a value, then repeats more.

pattern: %{a->},%{g}
string: foo,,,,,,bar
result: a=foo, g=bar

In the above example the , repeats many times, but the right padding modifier ->instructs the parser to skip over the repeating delimiters.

Postfix pattern

pattern: %{timestamp} %{+timestamp} %{+timestamp} %{logsource} %{program}[%{pid}]: %{message}
string: Mar 16 00:01:25 example postfix/smtpd[1713]: connect from example.com[192.100.1.3]
result: timestamp="Mar 16 00:01:25" , logsource="example", program="postix/smtpd" pid="1713" message="connect from example.com[192.100.1.3]"

elastic / dissect-specification

readme

Dissect

Terms:

Basic example:

Specification

Examples:

Right padding modifier `->`

Append modifier `+`

Append modifier with order `+` with `/n`

Named skip key `?`

Reference keys `*` and `&`

Remaining match

Consecutive repeating delimiters

Postfix pattern

elastic / dissect-specification

readme

Dissect

Terms:

Basic example:

Specification

Examples:

Right padding modifier ->

Append modifier +

Append modifier with order + with /n

Named skip key ?

Reference keys * and &

Remaining match

Consecutive repeating delimiters

Postfix pattern

Right padding modifier `->`

Append modifier `+`

Append modifier with order `+` with `/n`

Named skip key `?`

Reference keys `*` and `&`