elastic / dissect-specification

specification for the dissect parser
Apache License 2.0
2 stars 2 forks source link

Dissect

Dissect splits a string into its parts. A dissect implementation compares a string against a pattern and then splits the string based on the pattern rules. This specification defines the expected behavior for dissect implementations.

Terms:

Basic example:

This pattern has 3 keys: a, b, and c , and two delimiters ` (space) and,` (comma). When the parser is run against the string with the given pattern, the result is a set of key/value pairs. The parser searches for the when the delimiter in the pattern matches the delimiter in the string.

In this example, search the string for ` (space), the first delimiter. Found a space in the string, so assign the a key equal to everything up until the space (but not including). So a=foo. The next delimiter is,` (comma) search for comma in the string, found it, so assign b=bar. No more delimiters, so assign c to the remainder of the string (baz).

Specification

  1. Pattern Specification
    1. A dissect pattern must contain at least one key
    2. A dissect pattern may have leading and trailing and delimiters
    3. A dissect pattern may have multiple delimiters of different characters of different lengths.
    4. A dissect patten must contain unique key names unless the modifier allows or requires duplicated key names.
    5. A dissect pattern may not use % as delimiters
  2. Key specification
    1. A dissect key must start with %{ and end with }
    2. A dissect key may have a name, e.g. %{key_name} and it must be able to be encoded as UTF-8.
    3. A dissect key may have an empty name e.g. %{}, this is called a skip key and must not be included in the final results.
    4. A dissect key name may not have any of the modifiers characters as part of the name.
    5. A dissect key may have modifiers to the left or the right, or left and right of the key name.
  3. Modifier specification
    1. A dissect modifier must be defined inside the dissect key, to the left or right the key name.
    2. Multiple dissect modifiers per key may be allowed.
    3. ->: Right padding ignore - instructs the parser to ignore repeating consecutive repeating delimiters to the right of the key. The -> modifier must be placed to the right of the key name and is allowed to co-exist with any other modifiers and must always be the furthest right modifier. see example below
    4. + Append - instructs the parser to append this key's value to the value to the prior key (left to right) with the same name. A user defined append separator must be supported. The user defined separator is a character, or set of characters that will be placed between the appended values. The + modifier must be placed to the left of the key name. see example below
    5. + and /n Append with order - instructs the parser to append this key's to the value of the prior key with the same name based on order. The + modifier must be placed on the left of the key name and /n modifier placed to the right of the key name, where n = order. The order must start at 1. see example below
    6. ? - Named skip key instructs the parser to not include this result in the final result set. Behaves identical to an empty skip key %{} but may be used to help with human readability. The ? modifier must be placed to the left of the key name. see example below
    7. * and & reference modifiers. This modifier requires two keys with the same name present in the dissect pattern. One key with the * and another with the &. This instructs the parser that the value discovered by the * is to be used as the key name for the value discovered by the corresponding & key. These modifiers must be placed on the left of the key name. see example below
  4. Parser specification
    1. A dissect parser must not allow partial matches. All delimiters must be present in string, and all keys must have a corresponding value.
    2. A dissect parser must support an empty key %{} (skip key) as valid match, but not include the result in the result set.
    3. A dissect parser must be able to parse any string that can be encoded as UTF-8
    4. A dissect parser must match the leading and trailing delimiters if present in the dissect pattern.
    5. A dissect parser must allow the last key of a pattern to match the remainder of the string without additional modifiers. see example below
    6. A dissect parser must treat consecutive repeating delimiters as valid empty matches unless instructed otherwise by modifiers. see example below
    7. A dissect parser must allow a user specified string to use as the value between append operations. see example below
    8. A dissect parser must support multiple character delimiters.
    9. A dissect parser result set must be string/string key value pairs.
    10. A dissect parser must support all modifiers defined by they specification.

Examples:

Right padding modifier ->

In the above example, the delimiter is ` (space), the->instructs the parser to skip all of the consecutive repeating to the right ofa`

In the above example, the delimiter is , (comma) and the -> instructs the parser to skip all of the consecutive repeating , to the right of a

Multi-character delimiters must be supported.

Empty skip key with right padding must be supported.

Append modifier +

In the above example the, the values are append in left to right order to the result.

A user specified append separator must be supported. Assume the user define the separator to be , (comma space)

Append modifier with order + with /n

In the above example the values are appended together based on the order specified.

Named skip key ?

In the above example, the parser finds the matches correctly, but excludes the middle key from the results. This is the same behavior as %{}, and the name is only used for human readability.

Reference keys * and &

In the above example, there is a pair of a keys. One has the * and the other &. This instructs the parser to use the value of the * as the key name for the value of & in the result set. * and & must come in pairs in the dissect pattern.

The left / right order of * and &does not matter.

Remaining match

In the above example the last key matched the remainder of the input string.

Consecutive repeating delimiters

In the above example the , repeats many times, leaving 5 empty key/value pairs.

In the above example the , repeats many times, finds a value, then repeats more.

In the above example the , repeats many times, but the right padding modifier ->instructs the parser to skip over the repeating delimiters.

Postfix pattern