Closed Witiko closed 3 weeks ago
Apparently the response I had to this never got sent, and I only realized now. Apologies for that.
Unfortunately, that type isn't actually documented, so I'll need examples to implement code that properly understands it. Any that you can provide would be great, but I can try to see if I can generate my own.
@TheElementalOfDestruction: Thanks for the response! I am sorry that I did not reply within almost 9 months. 🙏
Unfortunately, that type isn't actually documented, so I'll need examples to implement code that properly understands it. Any that you can provide would be great, but I can try to see if I can generate my own.
I cannot share client data, so I looked for ways to create an example MSG file without any sensitive information. Apparently, MSG files of this class are produced by exporting search results from Microsoft Purview eDiscovery. However, the MSG format for MS Teams messages seems to be unavailable for new searches since 2022.
Instead of trying to produce example MSG files, I forked the library and in commit https://github.com/Witiko/msg-extractor/commit/4b5ff13473a718fd3d3b0f2bbe8556a0856228ba, I updated the library to recognize the class "IPM.SkypeTeams.Message" as a message type. I verified that this fixes the problem on 107 different client MSG files with the class "IPM.SkypeTeams.Message". Therefore, I opened PR https://github.com/TeamMsgExtractor/msg-extractor/pull/440 that should close this issue.
Have you confirmed that there doesn't appear to be any special data entries we should have implemented in a custom type? I'd rather make absolutely sure we get this as implemented as possible. One way I tend to go about this is to try and open the file in outlook and print it if possible. If a print option works to bring up print preview, I take a look at what headers at the top appear and see if there are any that are different from regular outlook messages.
If it does have differences you can quickly identify, implementing a new class for it is easy since Message and MessageBase are functionally identical, with Message just being a subclass of MessageBase which adds no functionality, only used for identifying that an MSG file is specifically of type message. If this format doesn't follow along with how standard messages work, id rather it have its own class.
One way I tend to go about this is to try and open the file in outlook and print it if possible. If a print option works to bring up print preview, I take a look at what headers at the top appear and see if there are any that are different from regular outlook messages.
I took one of the client files at random and I opened it in Outlook from Office 365. Here is the redacted print preview:
The rest of the first printed page lists dozens of other recipients, followed by the text of the message on the second page:
Therefore, the only headers seem to be From, Date, and To.
Opening the same file with my patched version of extract_msg
from PR https://github.com/TeamMsgExtractor/msg-extractor/pull/440 produces these same headers and also the standard header Message-Id, which Outlook does not seem to display:
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install -U pip wheel setuptools
(venv) $ pip install git+https://github.com/Witiko/msg-extractor.git@feat/skypeteams-message
(venv) $ python3
Python 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import extract_msg
>>> message = extract_msg.openMsg('client-file.msg')
>>> sorted([key for key, value in message.header.items() if value])
['Date', 'From', 'Message-Id', 'To']
There are also empty headers Authentication-Results, Bcc, and Cc:
>>> sorted(message.header.keys())
['Authentication-Results', 'Bcc', 'Cc', 'Date', 'From', 'Message-Id', 'To']
Does this seems as sufficient evidence that there are no special data entries or would you like me to try something else?
Aside from checking that all of that data appears in the body itself, that all looks good enough for me to consider it just a new flavor of Message and call it a day. If you discover a problem with that implementation, you can just submit a new pull request while I accept the original one.
I just finished submitting the current code to a new release, 0.51.0. If everything looks good, you can close this issue
Looks good to me, thanks!
Bug Metadata
extract_msg
packageDescribe the bug
Opening MSG files that contain MS Teams messages causes the following exception to be thrown:
Calling the function
extract_msg.openMsg()
immediately causes the exception to be thrown.Traceback