emalderson / ThePhish

ThePhish: an automated phishing email analysis tool
GNU Affero General Public License v3.0
1.15k stars 173 forks source link

[Question] #10

Closed mgrant0 closed 3 years ago

mgrant0 commented 3 years ago

Work environment

Question Answer
OS version (server) Debian, Ubuntu
OS version (client) any
Python version 3.9.2
Type of email address used any
Mail client type & version any
Browser type & version any
Virtualized Env. False
Dedicated RAM 16 GB
vCPU 4
ThePhish version 603eca6
TheHive version 4.1.11-1
Cortex version 3.1.1-1
MISP version 2.4.150
Installed using Docker and Docker Compose False
Docker Version n/a
Docker Compose version n/a

Question

This may not be specifically an issue for ThePhish but hopefully you can shed some light on how to go about solving it appropriately.

In Cortex, we want to use HaveIBeenPwned (HIBP). When we submit a message from ThePhish to be analyzed, the EmlParser grabs all the email addresses in a message, including the To. Then Cortex dutifully sends each of these addresses to HIBP. If any of them comes back positive, ThePhish flags the message as malicious. The problem, of course, is if the recipient's email is in this database, the message gets flagged as malicious!

What is the correct way to get some finer grain control over what gets sent to HIBP to make it work say on other addresses in the mail than the envelope recipient or to header?

Currently ThePhish closes the message as malicious if any one of the results comes back malicious. It seems like we need some finer control like if-this-and-that sort of control. I don't know if that belongs in ThePhish or what.

emalderson commented 3 years ago

ThePhish is an orchestrator, it does not have any control on what the results of the underlying analyzers are. In order to partially solve this problem, I included a configuration file called analyzers_level_conf.json, with which you can overwrite the result that an analyzer gives for a certain type of observable. In this case I think that the best way to handle this problem is that you add in that file this object:

"HIBP" : {
        "dataType" : ["mail"],
        "levelMapping" : {
            "malicious" : "suspicious",
            "suspicious" : "suspicious",
            "safe" : "safe",
            "info" : "info"
        }
    }

Where you should replace "HIBP" with the exact name of the analyzer as it appears on Cortex and that now I don't remember. In this way, the level malicious given by the analyzer is always mapped to suspicious, so that you can manually inspect the result and decide.

I could also think about using some if-then-else statement somewhere in the code to avoid analyzing the "To" header field with HIBP, but the problem is that also email addresses that are perfectly legitimate are often present in some data breach and so the email should never be marked as malicious only because of HIBP anyway.

mgrant0 commented 3 years ago

Thanks, this seems to help!