aholzel / TA-dmarc

Splunk app for the processing and ingestion of DMARC RUA reports
4 stars 2 forks source link

UTF-8 subject decoding #11

Closed jbouwh closed 1 year ago

jbouwh commented 2 years ago

Microsoft (again) has removed the [Preview] from their subjects, but is encoding the subjects off their DMARC reports.

Subjects look like this:

=?UTF-8?B?UmVwb3J0IERvbWFpbjogZXhhbXBsZS5jb20gU3VibWl0dGVyOiBwcm90ZWN0aW9uLm91dGxvb2suY29tIFJlcG9ydC1JRDogZGVhZGJlZWZkZWFkYmVlZmRlYWRiZWVmZGVhZA==?=

Unless decoded they cannot be parsed at the moment.

To decode the subject following code example could work:

"""E-mail subject decoder."""

import base64
import sys

SUBJECT1 = "=?UTF-8?B?UmVwb3J0IERvbWFpbjogZXhhbXBsZS5jb20gU3VibWl0dGVyOiBwcm90ZWN0aW9uLm91dGxvb2suY29tIFJlcG9ydC1JRDogZGVhZGJlZWZkZWFkYmVlZmRlYWRiZWVmZGVhZA==?="
SUBJECT2 = "Report Domain: example.com Submitter: protection.outlook.com Report-ID: deadbeefdeadbeefdeadbeefdead"

def decode_subject(subject):
    """Decode subject with different decoding."""
    if subject.startswith("=?"):
        subject_parts = subject.split('?')
        subject_base = subject_parts[3]
        return base64.b64decode(subject_base.encode(
            'utf-8')).decode(subject_parts[1])
    return subject

print(decode_subject(SUBJECT1))
print(decode_subject(SUBJECT2))
assert decode_subject(SUBJECT1) == SUBJECT2
assert decode_subject(SUBJECT2) == SUBJECT2
aholzel commented 1 year ago

This is solved in v4.0.2 but not via a custom function but by using email.header.make_header and email.header.decode_header subject = str(email.header.make_header(email.header.decode_header(message['Subject'])))