Open TheDr1ver opened 3 years ago
Delete:
*__encoding_*
^^ Addressed in #42
Scrub:
*_banner
cookies:
Set-Cookie.*?=(.*?);
(e.g. sessionid=<base64>; csrftoken=<base64>; expires=<date>)
*_http_response_body
<input.*?(?=token).*?value(.*?)>
# Or could be double-rex process.
# One rex to find <input> with 'token' inside it:
<input[^>]*?(?=token).*?>
# Then another to scrub the value inside of the result
s/value=\".*?\"/value=\"\"/g
^^^ note that this overly simplified. We should have a better way of scrubbing HTML in general.
Scrub-reliant deletes: (if any of the related fields get scrubbed in the previous function, delete these fields entirely from the result)
*_banner:
*banner_hex
*http_response_headers_Set_Cookie_*
*_http_response_body:
*_http_response_body_hash
*_http_response_body_size
Delete:
*_asn
*_isp
*_location_*
*_opts_*
^^ Addressed in #42
Scrub:
*_data
\nDate:(.*?)\n
Scrub-reliant deletes: (if any of the related fields get scrubbed in the previous function, delete these fields entirely from the result)
*_data:
*_hash
Consider revisiting stripping dates from data (shodan http data
80_data
or443_data
appears to be the biggest offender at the moment). This also affects the443_hash
value.Other targets for removal:
__encoding
in the key - value =DISPLAY_UTF8
or value =DISPLAY_HEX
443_opts_heartbleed
- contains date which will change every time_location_latitude
,_location_longitude
,_location_city
- might change too frequentlyNOTE - This scrubbing should only happen after the diff comes back with a positive result. That way we're not looking at every single character in every JSON blob that comes our way, plus it'll be easier to find "true scrubs" rather than accidentally deleting pieces of data that some plugin determines to look "date-like".
Subset for diffing inside bodies should be implemented
If you get a diff between HTML-specific fields like
*_http_response_body
then that HTML should be parsed and diffed separately if at all possible... But that may quickly get so complicated as to turn into a project of its own.