DataDog / libddwaf-rb

Bindings to libddwaf for Ruby
Other
0 stars 1 forks source link

Handle invalid encoding #33

Closed lloeki closed 1 year ago

lloeki commented 1 year ago

What does this PR do?

Motivation

The original finding was that libddwaf is able to handle input that contains characters that don't match the original encoding. Since we truncate at 4096 chars as understood by the original string encoding (e.g ASCII-8BIT) this may result in a truncated multibyte chars (e.g if it contains UTF-8 chars) being passed to libddwaf as a C string (more like byte array).

On a match, in value and highlight fields libddwaf will then return a C string that is understood by Ruby as being UTF-8. This will contain the original byte array. The occurence of an incomplete character produced a JSON.dump exception.

More generally the original string may thus theoretically contain characters that:

By converting to UTF-8 we enable:

In the case that the original string has characters that don't make sense in the original encoding and thus cannot be converted, we convert them to the standard Unicode \u{FFFD} character meant for that purpose. Indeed keeping the original ones does not make sense as no reliable semantic can be possibly inferred for libddwaf to make sense of the data.

Additional Notes

How to test the change?

Specs have been added to cover these cases.

GustavoCaso commented 1 year ago

The changes LGTM

We could extract the encoding step into an intermediary variable to make the code more readable.

encoded_val = val.to_s.encode('utf-8', invalid: :replace, undef: :replace)
val = encoded_val[0, max_string_length] if max_string_length