emalderson / ThePhish

ThePhish: an automated phishing email analysis tool
GNU Affero General Public License v3.0
1.09k stars 172 forks source link

[BUG] Error parsing and getting mails #23

Open informaticaeloy opened 2 years ago

informaticaeloy commented 2 years ago

Describe the bug

When i push over the "List Mails" i get an error. It appears to be a issue with the list_emails.py. on line 184. I have tried with plain/text mails and html mails

Work environment

Question Answer
OS version (server) Ubuntu Desktop 22.04
OS version (client) Ubuntu, ...
Python version 3.10.4
Type of email address used office 365
Browser type & version Chrome
Virtualized Env. True
Dedicated RAM 8 GB
vCPU 4
ThePhish version
TheHive version 4.1.9-1
Cortex version 3.1.1-1
MISP version 2.4.148
Installed using Docker and Docker Compose True
Docker Version 20.10.12
Docker Compose version 1.29.2

Screenshots image

image

Log

thephish | AttributeError: 'NoneType' object has no attribute 'contents' thephish | thehive | [info] o.t.s.AccessLogFilter [00000004|] 172.19.0.1 GET /api/status took 3ms and returned 200 752 bytes thehive | [info] o.t.s.AccessLogFilter [00000005|] 172.19.0.1 GET /api/status took 2ms and returned 200 752 bytes thehive | [info] o.t.s.AccessLogFilter [00000006|] 192.168.46.213 GET /api/status took 2ms and returned 200 752 bytes thephish | [INFO]_[listemails]: Connected to myemail@mymail.com@outlook.office365.com:993/inbox thephish | [INFO][listemails]: 3 unread messages to process thephish | [INFO][listemails]: Message from: b' prueba@juan.com' with subject: hola thephish | [INFO][listemails]: Message from: b' prueba@juan.com' with subject: prueba 4 thephish | [ERROR][list_emails]: Error while trying to retrieve the emails: Traceback (most recent call last): thephish | File "/root/thephish/list_emails.py", line 250, in main thephish | emails_info = retrieve_emails(connection) thephish | File "/root/thephish/list_emails.py", line 184, in retrieve_emails thephish | body = soup.body.div.p.span.contents[0] thephish | AttributeError: 'NoneType' object has no attribute 'contents' thephish | thehive | [info] o.t.s.AccessLogFilter [00000007|] 172.19.0.1 GET /api/status took 2ms and returned 200 752 bytes thehive | [info] o.t.s.AccessLogFilter [00000008|] 172.19.0.1 GET /api/status took 1ms and returned 200 752 bytes thehive | [info] o.t.s.AccessLogFilter [00000009|] 192.168.46.213 GET /api/status took 1ms and returned 200 752 bytes

majo053 commented 1 year ago

Hello, this is problem with encoding email header From:

You can fix it:

Change file list_emails.py to this:

            msg = email.message_from_bytes(message)
            decode = email.header.decode_header(msg['From'])
            from_field = ""
            for decode_item in decode:
                    if decode_item[1] is not None:
                            from_field += decode_item[0].decode(decode_item[1])
                    else:
                            if isinstance(decode_item[0], bytes):
                                    from_field += decode_item[0].decode()
                            else:
                                    from_field += str(decode_item[0])

Change file case_from_email.py to this:

            msg = email.message_from_bytes(message)
            decode = email.header.decode_header(msg['From'])
            external_from_field = ""
            for decode_item in decode:
                    if decode_item[1] is not None:
                            external_from_field += decode_item[0].decode(decode_item[1])
                    else:
                            if (isinstance(decode_item[0], bytes)):
                                    external_from_field += decode_item[0].decode()
                            else:
                                    external_from_field += str(decode_item[0])
            parsed_from_field = email.utils.parseaddr(external_from_field)
            if len(parsed_from_field) > 1:
                    external_from_field = parsed_from_field[1]

@emalderson Can you please update this files?

emalderson commented 1 year ago

Hello, sorry for the late reply but i'm very busy lately. Anyway, thank you for providing the code to fix the bug that you encountered, but the problem with that part of code is that when you "fix" one thing, you can easily break 100 other things. What I mean is that if I blindly added your fix to the code, I may break the parsing logic for many other emails in which the from field has different properties. I need to test the change on all the emails that I have and then I'll consider adding your code and mention you for the contribution.

tiagotsi commented 1 year ago

Does anyone have a working image of ThePhish in .OVF? I am not able to install by Docker and Docker compose.

LoriSchochWIT commented 8 months ago

Hello, is there an update for this issue? Or can anybody provide a solution on how to implement the suggested code from @majo053? I don't understand which lines to replace exactly. Thanks in advance!

LoriSchochWIT commented 6 months ago

Hi, is there still no working solution that can be implemented? @emalderson

emalderson commented 1 month ago

Hello. Unfortunately, errors like these need a thorough testing process. Those are infact related to the absurdly big number of ways in which an email can be encoded into the MIME multipart format. I managed to cover the most widespread use cases, but I cannot predict how every email client encodes the emails. This means that the fields that ThePhish needs to extract are not located in any of the fields that I search in programmatically, so the code breaks. Plus, there may also be some issues with chinese or japanese characters.

This same error is also mentioned in issue #40.

Moreover, the code provided by majo053 does not fix the problem, since the problem highlighted here is related to the encoding and decoding of the HTML part in the email.