TeamMsgExtractor / msg-extractor

Extracts emails and attachments saved in Microsoft Outlook's .msg files
GNU General Public License v3.0
740 stars 172 forks source link

Opening MSG files that contain MS Teams messages throws UnrecognizedMSGTypeError: Could not recognize MSG class type "IPM.SkypeTeams.Message" #401

Closed Witiko closed 3 weeks ago

Witiko commented 9 months ago

Bug Metadata

Describe the bug

Opening MSG files that contain MS Teams messages causes the following exception to be thrown:

extract_msg.exceptions.UnrecognizedMSGTypeError: Could not recognize MSG class type "IPM.SkypeTeams.Message". As such, there is a high chance that support may be impossible, but you should contact the developers to find out more.

What code did you use or can we use to reproduce this error?

Calling the function extract_msg.openMsg() immediately causes the exception to be thrown.

Traceback

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/extract_msg/open_msg.py", line 170, in openMsg
    raise UnrecognizedMSGTypeError(f'Could not recognize MSG class type "{ct}". As such, there is a high chance that support may be impossible, but you should contact the developers to find out more.')
TheElementalOfDestruction commented 8 months ago

Apparently the response I had to this never got sent, and I only realized now. Apologies for that.

Unfortunately, that type isn't actually documented, so I'll need examples to implement code that properly understands it. Any that you can provide would be great, but I can try to see if I can generate my own.

Witiko commented 3 weeks ago

@TheElementalOfDestruction: Thanks for the response! I am sorry that I did not reply within almost 9 months. 🙏

Unfortunately, that type isn't actually documented, so I'll need examples to implement code that properly understands it. Any that you can provide would be great, but I can try to see if I can generate my own.

I cannot share client data, so I looked for ways to create an example MSG file without any sensitive information. Apparently, MSG files of this class are produced by exporting search results from Microsoft Purview eDiscovery. However, the MSG format for MS Teams messages seems to be unavailable for new searches since 2022.

Instead of trying to produce example MSG files, I forked the library and in commit https://github.com/Witiko/msg-extractor/commit/4b5ff13473a718fd3d3b0f2bbe8556a0856228ba, I updated the library to recognize the class "IPM.SkypeTeams.Message" as a message type. I verified that this fixes the problem on 107 different client MSG files with the class "IPM.SkypeTeams.Message". Therefore, I opened PR https://github.com/TeamMsgExtractor/msg-extractor/pull/440 that should close this issue.

TheElementalOfDestruction commented 3 weeks ago

Have you confirmed that there doesn't appear to be any special data entries we should have implemented in a custom type? I'd rather make absolutely sure we get this as implemented as possible. One way I tend to go about this is to try and open the file in outlook and print it if possible. If a print option works to bring up print preview, I take a look at what headers at the top appear and see if there are any that are different from regular outlook messages.

If it does have differences you can quickly identify, implementing a new class for it is easy since Message and MessageBase are functionally identical, with Message just being a subclass of MessageBase which adds no functionality, only used for identifying that an MSG file is specifically of type message. If this format doesn't follow along with how standard messages work, id rather it have its own class.

Witiko commented 3 weeks ago

One way I tend to go about this is to try and open the file in outlook and print it if possible. If a print option works to bring up print preview, I take a look at what headers at the top appear and see if there are any that are different from regular outlook messages.

I took one of the client files at random and I opened it in Outlook from Office 365. Here is the redacted print preview:

image

The rest of the first printed page lists dozens of other recipients, followed by the text of the message on the second page:

image

Therefore, the only headers seem to be From, Date, and To.

Opening the same file with my patched version of extract_msg from PR https://github.com/TeamMsgExtractor/msg-extractor/pull/440 produces these same headers and also the standard header Message-Id, which Outlook does not seem to display:

$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install -U pip wheel setuptools
(venv) $ pip install git+https://github.com/Witiko/msg-extractor.git@feat/skypeteams-message
(venv) $ python3
Python 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import extract_msg
>>> message = extract_msg.openMsg('client-file.msg')
>>> sorted([key for key, value in message.header.items() if value])
['Date', 'From', 'Message-Id', 'To']

There are also empty headers Authentication-Results, Bcc, and Cc:

>>> sorted(message.header.keys())
['Authentication-Results', 'Bcc', 'Cc', 'Date', 'From', 'Message-Id', 'To']

Does this seems as sufficient evidence that there are no special data entries or would you like me to try something else?

TheElementalOfDestruction commented 3 weeks ago

Aside from checking that all of that data appears in the body itself, that all looks good enough for me to consider it just a new flavor of Message and call it a day. If you discover a problem with that implementation, you can just submit a new pull request while I accept the original one.

TheElementalOfDestruction commented 3 weeks ago

I just finished submitting the current code to a new release, 0.51.0. If everything looks good, you can close this issue

Witiko commented 3 weeks ago

Looks good to me, thanks!