bbottema / outlook-message-parser

A Java parser for Outlook messages (.msg files)
76 stars 35 forks source link

Missing attachments & embedded images in signed messages #40

Closed Faelean closed 9 months ago

Faelean commented 3 years ago

I don't know if this might be related to #1 (issue doesn't mention signed mails and it works with non signed mails) and #4 (based on the issue and the documentation it should work).

I've sent an S/MIME signed (not encrypted) mail with one attachment and one embedded image to two different mail adresses. One was saved as an msg file from Outlook and one as an eml file from the Gmail web interface. Then I've parsed them using this code block (slightly modified from the documentation) and printed the information about attachments and S/MIME details:

public static void main(String[] args) throws IOException {
        String emlFileName = ".\\files\\testSigned2.eml";
        String msgFileName = ".\\files\\testSigned2.msg";

        try (FileInputStream fileInputStream = new FileInputStream(emlFileName)) {

            Email email = EmailConverter.emlToEmail(fileInputStream);

            printResult(email, emlFileName);

        }

        try (FileInputStream fileInputStream = new FileInputStream(msgFileName)) {

            Email email = EmailConverter.outlookMsgToEmail(fileInputStream);

            printResult(email, msgFileName);

        }
    }

    private static void printResult(Email email, String fileName) {
        System.out.println("-----" + fileName + "-----");
        System.out.println("---Attachments---");
        for(AttachmentResource attachmentResource: email.getAttachments()) {
            System.out.println(attachmentResource.getName());
        }

        System.out.println("---Decrypted Attachments---");
        for(AttachmentResource attachmentResource: email.getDecryptedAttachments()) {
            System.out.println(attachmentResource.getName());
        }

        System.out.println("---Embedded Images---");
        for(AttachmentResource attachmentResource: email.getEmbeddedImages()) {
            System.out.println(attachmentResource.getName());
        }

        System.out.println("---S/MIME Details---");
        OriginalSmimeDetails details = email.getOriginalSmimeDetails();
        System.out.println("Mode: " + details.getSmimeMode()); // SIGNED
        System.out.println("Mime: " + details.getSmimeMime()); // application/pkcs7-mime or multipart/signed
        System.out.println("Type: " + details.getSmimeType()); // signed-data, enveloped-data
        System.out.println("Name: " + details.getSmimeName()); // smime.p7m or smime.p7s
        System.out.println("Micalg: " + details.getSmimeMicalg()); // ie. sha-512
        System.out.println("SignedBy: " + details.getSmimeSignedBy()); // email or name used

        System.out.println("");
    }
}

The results I get show that the msg file is missing the attached image, the embedded image and basically all S/MIME information:

-----.\files\testSigned2.eml-----
---Attachments---
gettyimages-1254246621-1.jpg
smime.p7s
---Decrypted Attachments---
gettyimages-1254246621-1.jpg
smime.p7s
---Embedded Images---
image001.jpg@01D70933.15C58FD0
---S/MIME Details---
Mode: SIGNED
Mime: multipart/signed
Type: null
Name: null
Micalg: SHA1
SignedBy: sielenkemper@otris.de

-----.\files\testSigned2.msg-----
---Attachments---
smime.p7m
---Decrypted Attachments---
smime.p7m
---Embedded Images---
---S/MIME Details---
Mode: PLAIN
Mime: null
Type: null
Name: null
Micalg: null
SignedBy: null

What I've noticed is when I extract the p7m file from the msg both images are Base64 encoded inside the file together with most of the S/MIME Details (couldn't verify SignedBy, but multipart/signed and SHA1 are there). Also looking at the p7m file, the first lines are:

MIME-Version: 1.0
Content-Type: multipart/signed;
    protocol="application/x-pkcs7-signature";
    micalg=SHA1;
    boundary="----=_NextPart_000_003F_01D70933.15C7D9C0"

Seeing that protocol is present I would have expected to not run into this issue (https://github.com/bbottema/simple-java-mail/issues/292) when using 6.4.5 instead of 6.5.0 but the NullPointerException is still there, so there might be some problem reading the information.

I've attached both used mails: signedMails.zip

bbottema commented 3 years ago

Ahh you had me confused a bit, since you reported the bug in the outlook-message-parser, but you're running into this issue using Simple Java Mail API (and refer to a bug fix there as well).

I hope to take a look at this somewhere this week.

Faelean commented 1 year ago

Any chance you could have a look at this?

bbottema commented 9 months ago

This is a painful part of the library. Basically it works by looking at the headers for S/Mime related info. Apparently this only works for a part of Outlook messages. The Outlook support currently has no other support for S/MIME by other means; aside from looking at headers, I have no clue how else to recognize S/Mime data.

However, with Simple Java Mail, we have identified a work-around.

Some context to the problem we are facing: we have this problem with msg #40. As a workaround, we convert msg -> Email -> eml -> Email. This way we have access to the attachments which were not available in the original msg -> Email.

I checked and this works in your case. I'm not sure why though 😅

bbottema commented 9 months ago

It's really odd. I followed the conversion in the debugger. I can see the library writes away .p7m datasource as attachment in the MimeMessage, but when reading it back, I get two attachments back that are decrypted (so the .jpg embedded and normal attachment). I'm not sure why this happens. It's almost as if Jakarta/Angus Mail is doing this by itself.

This stuff is making my head hurt!

/edit: Jakarta mail indeed automatically unwraps .p7m signed attachments, when read back or converted to EML.

bbottema commented 9 months ago

After a lot of digging (and slowly and painfully recovering the repressed knowledge of all S/MIME scenarios), I've identified the root of the issue. The problem lies in the handling of a signed attachment that, while correctly typed as multipart/signed, is missing the crucial S/MIME protocol parameter in the MIME type. This omission renders the message non-compliant with typical S/MIME standards, as the protocol qualifier is essential for definitively determining the nature of the content (whether it's encrypted, simply signed, or a pertains to a different mechanism altogether). As a result, this can be considered a malformed message in the context of standard S/MIME practices.

To address this, we need to explore alternative methods for recognizing the type of content in such scenarios. This might involve implementing heuristic approaches or fallback mechanisms that can make an educated guess about the content type in the absence of explicit protocol specification.

bbottema commented 9 months ago

It's been awhile, but I think I finally solved this problem in simple-java-mail:8.5.1. I don't know if you're still around reading this, but if you do and have some time, perhaps you can verify the fix.

Anyway, as always, thanks for reporting your findings!

(closing as duplicate of https://github.com/bbottema/simple-java-mail/issues/486)

Faelean commented 9 months ago

I've only done some basic tests, but so far it seems to be working. I'll need to verify when we update to 8.5.1 sometime early next year.