Closed majikman111 closed 1 month ago
# test.yaml title: test logsource: product: test detection: selection: - conditionA: '\\' # String declared with single quotes containing multiple slashes. - conditionB: '\a\a' condition: selection
the loading process will generate a
SigmaString
for conditionA that is exactly 1 backslash long, where conditionB will generate aSigmaString
that is four characters long.
This works exactly as intended. The first backslash in conditionA escapes the second one into one literal backslash. In conditionB the backslashes have no escaping semantic because they are not followed by a wildcard or backslash and therefore are just passed as-is into the SigmaString
.
The expected behavior should be such that conditionA is loaded as a
SigmaString
containing 2 backslash characters.
No, then the semantics get inconsistent because there's no clean possibility anymore to express a single literal \
followed by anything. \\*
would then be a double backslash followed by a wildcard while \*
would be a literal *
.
This is also inconsistent with the documentation for
SigmaString.convert
which states that only wildcard characters are escaped. https://sigmahq-pysigma.readthedocs.io/en/latest/Sigma_Rules.html#sigma.types.SigmaString.convert
SigmaString.convert
is for conversion of a SigmaString
into the string contained in the converted query, it's not the parsing of a string into a SigmaString. Furthermore, the wildcard string escaping is the default behavior, but convert allows to add further characters to escape.
print(collection[0].detection.detections['selection'].detection_items[0].detection_items[0].value[0])
\
This should be 2 slashes
print
on a SigmaString
shows a representation that is not intended for any productive usage. To convert a SigmaString
into something meaningful in the given query language, the aforementioned convert()
method should be used.
Additionally, the original correct string is copied to the
original
attribute, but is lost in the case of wildcarded queries (such as when using the contains modifier) when the wildcard character is added. If the original field were maintained during__add__
or other string operations, that would also be helpful.
The main purpose of SigmaString
is to maintain a parsed representation of strings where semantical wildcards and literal wildcard characters without special semantics can be clearly distinguished. In sigmac it was a huge mess because there wasn't such a possibility and each backend made its own interpretation of strings. The original
attribute was initially only intended for debugging purposes and shouldn't be used. It's more likely that this attribute will be dropped in future releases due to such inconsistencies instead of implementing all SigmaString
operations for it. Latter would be challenging to implement because of corner cases like SigmaString("\\").original + SigmaString("*").original
.
When loading rules from yaml, the
SigmaString
class inconsistently escapes special characters, especially when strings are defined using single quotes. Given a rule such asthe loading process will generate a
SigmaString
for conditionA that is exactly 1 backslash long, where conditionB will generate aSigmaString
that is four characters long. The expected behavior should be such that conditionA is loaded as aSigmaString
containing 2 backslash characters. I believe this line of code could be changed as follows to correctly load multiple slash characters.https://github.com/SigmaHQ/pySigma/blob/main/sigma/types.py#L127
This is also inconsistent with the documentation for
SigmaString.convert
which states that only wildcard characters are escaped. https://sigmahq-pysigma.readthedocs.io/en/latest/Sigma_Rules.html#sigma.types.SigmaString.convertThe following snippet demonstrates the inconsistent character escaping.
Additionally, the original correct string is copied to the
original
attribute, but is lost in the case of wildcarded queries (such as when using the contains modifier) when the wildcard character is added. If the original field were maintained during__add__
or other string operations, that would also be helpful.