Open evverx opened 2 years ago
One of the major flaws of the current (CSV) format is that the separator (;
) can appear in the randomly generated strings, making machine-parsing of the log file harder or sometimes almost impossible.
Those logs could help to look for example for timeouts
Looks like timeouts have never been logged by dfuzzer :-(
As for the random strings, I guess one possible fix would be to process the strings via https://docs.gtk.org/glib/func.strescape.html before printing them out. This might also help with #80, since strings could be wrapped in "
and identified by that. As the documentation suggests, this operation could be easily reversed by https://docs.gtk.org/glib/func.strcompress.html, and the escape sequences should be compatible with bash as well:
Escapes the special characters '\b', '\f', '\n', '\r', '\t', '\v', '\' and '"' in the string source by inserting a '\' before them. Additionally all characters in the range 0x01-0x1F (everything below SPACE) and in the range 0x7F-0xFF (all non-ASCII chars) are replaced with a '\' followed by their octal representation. Characters supplied in exceptions are not escaped.
I'd pick json
(or any other format where escaping is no longer an issue) because for example busctl
dumps stuff like
{
"type" : "method_call",
"endian" : "l",
"flags" : 0,
"version" : 1,
"cookie" : 2,
"timestamp-realtime" : 1652039190518701,
"sender" : ":1.147",
"destination" : "org.freedesktop.resolve1",
"path" : "/org/freedesktop/resolve1",
"interface" : "org.freedesktop.resolve1.Manager",
"member" : "ResolveHostname",
"payload" : {
"type" : "isit",
"data" : [
0,
"google.com",
0,
0
]
}
}
and it can be put into "advanced" dictionaries: https://github.com/matusmarhefka/dfuzzer/issues/81. The idea is to monitor the system bus, pick "valid" messages and stuff them into those dictionaries (semi-automatically hopefully)
That sounds definitely better, and should be relatively easily doable via https://gnome.pages.gitlab.gnome.org/json-glib/ and maybe even with https://gnome.pages.gitlab.gnome.org/json-glib/json-gvariant.html.
Giving json_gvariant_serialize_data()
a quick spin, it seems to work like a charm:
-- Signature: (isaaai(y(b(n(q(iua{ov})v)o))x(dh))a{t(bov)})
-- Value: (-2147483648, 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', [[@ai []]], (byte 0x00, (false, (int16 -32768, (uint16 0, (-2147483648, uint32 0, {objectpath '/': <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>, '/': <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>, '/': <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>}), <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), objectpath '/')), int64 -9223372036854775808, (1.7976931348623157e+308, handle 0)), {uint64 0: (false, objectpath '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), 0: (false, '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), 0: (false, '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>), 0: (false, '/', <'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'>)})
Serialized GVariant: [-2147483648,"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",[[[]]],[0,[false,[-32768,[0,[-2147483648,0,{"/":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}],"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],"/"]],-9223372036854775808,[1.7976931348623157e+308,0]],{"0":[false,"/","AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"]}]
$ echo '[-2147483648,"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",[[[]]],[0,[false,[-32768,[0,[-2147483648,0,{"/":"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"}],"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"],"/"]],-9223372036854775808,[1.7976931348623157e+308,0]],{"0":[false,"/","AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"]}]' | jq .
[
-2147483648,
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
[
[
[]
]
],
[
0,
[
false,
[
-32768,
[
0,
[
-2147483648,
0,
{
"/": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
}
],
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
],
"/"
]
],
-9223372036854775808,
[
1.7976931348623157E+308,
0
]
],
{
"0": [
false,
"/",
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
]
}
]
That should, hopefully, be compatible with the format produced by busctl
as well.
Also, would it make sense to log only unsuccessful cases? Something like libfuzzer/AFL does - i.e. log only crashes/timeouts, once such case per file, so they can be then used as 'reproducers' later. Or do we want to log everything into one file, marked by a type of fail (timeout, crash, ...)?
In its current form logs are supposed to look like https://github.com/matusmarhefka/dfuzzer/pull/4 to make
reprogen.py
work as far as I understand but it would probably make sense to revisit the format to make it easier to parse logs in general. Those logs could help to look for example for timeouts that are ignored bydfuzzer
by default.