Closed sanastasiadis closed 5 months ago
Hi, thanks for helping out!
To detect X500 addresses, the regular expression is used: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)?" This regular expression accepts only one appearance of "cn=".
To be specific, it means it accepts at most only one. So it's optional.
Proposed solution is to change the regular expression to search for "one or more" "cn=" parts, by changing the final "?" into "+". Eg: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)+".
This, however, makes it mandatory (one or more).
I guess, the correct way to handle both the case you found and to be backwards compatible, is to make it zero or more. So
/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)*
What do you think?
Yes, you are right. Using "*" was an afterthought for me as well. I will align my PR.
Change released in 1.13.1
I may have to revert this change, as it caused a test case in Simple Java Mail to fail.
java.lang.AssertionError: Expecting actual: [Recipient{name='Sven Sielenkemper', address='sielenkemper@otris.de', type=To}, Recipient{name='sven.sielenkemper@gmail.com', address='sven.sielenkemper@gmail.com', type=To}] to contain exactly (and in same order): [Recipient{name='Sven Sielenkemper', address='/o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper', type=To}, Recipient{name='sven.sielenkemper@gmail.com', address='sven.sielenkemper@gmail.com', type=To}] but some elements were not found: [Recipient{name='Sven Sielenkemper', address='/o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper', type=To}] and others were not expected: [Recipient{name='Sven Sielenkemper', address='sielenkemper@otris.de', type=To}]
For the .eml and .msg files in this zip: #486 TestValidSigned messages.zip (from #40)
Perhaps you can have a look at it?
It seems this error comes from the test class EmailConverterTest of simple-java-mail, and it is the test _Github486InvalidSignedOutlookMessage.
This test compares the results of parsing between the .eml version of the file and the .msg version of the file. The parsed eml version of the file is represented as the "actual" here, and the parsed msg version of the file is the "expected" one.
The "actual" list of recipients comes from simple-java-mail: org.simplejavamail.converter.internal.mimemessage.MimeMessageParser.
The "expected" list of recipients comes from outlook-message-parser:
org.simplejavamail.outlookmessageparser.OutlookMessageParser.
Apparently the "expected" list of recipients is as expected (:smiley:) and here follows an explanation:
In the file "#486 TestInvalidSignedOutlookMessage.msg" It looks that the entry for
[Recipient{name='Sven Sielenkemper', address='sielenkemper@otris.de', type=To}]
has both properties:
0x3003: /o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper
and
0x39fe: sielenkemper@otris.de
However the code handles it correctly because:
As I see the error produced by the unit test in your comment, the "actual" result could have been produced with the previous version of the regular expression (with the ?). I reproduced the issue parsing the msg file locally.
Using the previous version of the regular expression (with the ?), the X500 address is not recognised and the address field is left null to be filled with the address property of 0x39fe which comes next when parsing. Output:
[name: Sven Sielenkemper, address: sielenkemper@otris.de] [name: sven.sielenkemper@gmail.com, address: sven.sielenkemper@gmail.com]
On the other hand, the suggested version of the regular expression (with the *) matches correctly the given X500 address, and as the 0x3003 property comes first, it fills the address field. Then when parsing the 0x39fe, it is skipped as the address field is not null anymore. Output as expected:
[name: Sven Sielenkemper, address: /o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper] [name: sven.sielenkemper@gmail.com, address: sven.sielenkemper@gmail.com]
I'm now thinking that since both properties 0x3003 and 0x39fe exist, maybe the latter should overwrite the first one, because the email address has "priority" over the X500 address? What do you think?
Thank you so much for having a good look, you explained it well.
Regarding multiple apparent recipients (I'm still getting used to X500, as it strikes me as totally alien), I'm not sure myself. For easy reproduction, I've added the email and test case in this project in the branch bugfix/#73-x500-recipient-issue
.
This test turns ✔ green:
@Test
public void testGithubIssue73_AttachmentNameWithSpecialCharacters()
throws IOException {
OutlookMessage msg = parseMsgFile("test-messages/OutlookMessage with X500 dual address.msg");
// just assert what currently comes back from the parser, not what the parser should do, because I'm not sure yet what the parser should do
OutlookMessageAssert.assertThat(msg).hasOnlyRecipients(
createRecipient("Sven Sielenkemper", "/o=otris/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=ad5e65c5e53d40e2825d738a39904682-sielenkemper"),
createRecipient("sven.sielenkemper@gmail.com", "sven.sielenkemper@gmail.com")
);
}
I'm not sure myself what we should be expecting here.
Thoughts? Perhaps we should just compare it to what Outlook does:
Sven Sielenkemper sielenkemper@otris.de sven.sielenkemper sven.sielenkemper@gmail.com
Yes, apparently Outlook has a place for every property to accommodate them. :-) However, in the case of outlook-message-parser there is only a single "address" field for every recipient.
The most "proper" solution maybe would be to add an extra field to the OutlookRecipient to accommodate the X500 address in case it exists? The client then would pick-up whatever serves better to him. What do you think?
Indeed, that may be the best way forward. Then the primary behaviour mimics Outlook, while preserving X500 data if available.
I created PR https://github.com/bbottema/outlook-message-parser/pull/75 for this. This seems to work well. Perhaps you can have a look as well.
New fix released in 1.13.2. Thanks again!
To detect X500 addresses, the regular expression is used: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)?" This regular expression accepts only one appearance of "cn=".
It is possible to have X500 addresses with multiple "cn=" parts, eg: "/o=Test/ou=Unit/cn=Recipients/cn=anassta". In this case the above regular expression will not work.
Proposed solution is to change the regular expression to search for "one or more" "cn=" parts, by changing the final "?" into "+". Eg: "/o=[^/]+/ou=[^/]+(?:/cn=[^/]+)+".
I tested with the proposed regular expression and it works for both cases: