TheDr1ver / MISPHunter

Uses searches on 3rd party services and MISP to track actor infrastructure as it's built
Apache License 2.0
2 stars 0 forks source link

JSON Dict Comparison Improvement #22

Open TheDr1ver opened 3 years ago

TheDr1ver commented 3 years ago

Consider revisiting stripping dates from data (shodan http data 80_data or 443_data appears to be the biggest offender at the moment). This also affects the 443_hash value.

Other targets for removal:

NOTE - This scrubbing should only happen after the diff comes back with a positive result. That way we're not looking at every single character in every JSON blob that comes our way, plus it'll be easier to find "true scrubs" rather than accidentally deleting pieces of data that some plugin determines to look "date-like".

Subset for diffing inside bodies should be implemented

If you get a diff between HTML-specific fields like *_http_response_body then that HTML should be parsed and diffed separately if at all possible... But that may quickly get so complicated as to turn into a project of its own.

TheDr1ver commented 3 years ago

Censys

Delete:

*__encoding_*

^^ Addressed in #42

Scrub:

*_banner
    cookies:
        Set-Cookie.*?=(.*?);
        (e.g. sessionid=<base64>; csrftoken=<base64>; expires=<date>)

*_http_response_body
    <input.*?(?=token).*?value(.*?)>
    # Or could be double-rex process. 
        # One rex to find <input> with 'token' inside it:
            <input[^>]*?(?=token).*?>
        # Then another to scrub the value inside of the result
            s/value=\".*?\"/value=\"\"/g
    ^^^ note that this overly simplified. We should have a better way of scrubbing HTML in general.

Scrub-reliant deletes: (if any of the related fields get scrubbed in the previous function, delete these fields entirely from the result)

*_banner:
    *banner_hex
    *http_response_headers_Set_Cookie_*

*_http_response_body:
    *_http_response_body_hash
    *_http_response_body_size
TheDr1ver commented 3 years ago

Shodan

Delete:

*_asn
*_isp
*_location_*
*_opts_*

^^ Addressed in #42

Scrub:

*_data
    \nDate:(.*?)\n

Scrub-reliant deletes: (if any of the related fields get scrubbed in the previous function, delete these fields entirely from the result)

*_data:
    *_hash