enthought / comtypes

A pure Python, lightweight COM client and server framework, based on the ctypes Python FFI package.
Other
282 stars 96 forks source link

Outlook MailItem Date Objects Wrong #519

Closed DragonRulerX closed 2 months ago

DragonRulerX commented 4 months ago

Unless there's something I'm missing or I'm just going about this entirely wrong - I'm of the impression that the value returned by all methods that should return a DateTime object (or at least some kind of date/datetime/timestamp according to the docs) are returning untranslatable values (or at least I have no clue how to translate the value to the proper timestamp).

Environment:

Here is my code:

import comtypes
import comtypes.client

o = comtypes.client.CreateObject("Outlook.Application")

def get_collection(countable, reverse=False):
    if not hasattr(countable, "Count"):
        raise AttributeError(f'{countable} does not have attribute "Count"')

    _range = range(countable.Count, 0, -1) if reverse else range(1, countable.Count + 1)

    return (countable.Item(i) for i in _range)

def get_emails():
    if hasattr(get_emails, "cache"):
        return get_emails.cache

    get_emails.cache = list(get_collection(o.GetNamespace("MAPI").Folders))
    return get_emails.cache

def get_folders(email):
    if not hasattr(get_folders, "cache"):
        get_folders.cache = {}

    if (cache := get_folders.cache) and email in cache:
        return cache[email]

    def walk(parent):
        for folder in get_collection(parent.Folders):
            yield folder
            yield from walk(folder)

    cache[email] = list(walk(email))
    return cache[email]

def get_mail(folder):
    def is_valid(item):
        # Not sure how to use isinstance properly here to avoid "MeetingItem" objects
        # So, these first two are the best I could come up with
        if not isinstance(item, comtypes._compointer_base):
            return False
        elif item.__com_interface__.__name__ != "_MailItem":
            return False
        # Some of the default folders cause issues so these next two clear up my errors for now
        elif not hasattr(item.Sender, "Address"):
            return False
        elif not item.Subject:
            return False
        return True
    return filter(is_valid, get_collection(folder.Items))

def check_senders():
    for e in get_emails():
        for f in get_folders(e):
            for m in get_mail(f):
                print(m.Subject.encode("utf-8"))
                print(m.Sender.Address)
                print(f"{m.ReceivedTime=} ({type(m.ReceivedTime)})")
                return

if __name__ == "__main__":
    check_senders()

Output:

b'Test Email'
fake@for.github.com
m.ReceivedTime=45291.33416666667 (<class 'float'>)

Expected (Approximately):

b'Test Email'
fake@for.github.com
m.ReceivedTime=1704002400.0 (<class 'float'>)  # 2023-12-31

Expected Alt (Approximately):

b'Test Email'
fake@for.github.com
m.ReceivedTime=datetime.datetime(2023, 12, 31, 0, 0) (<class 'datetime.datetime'>)  # 2023-12-31

If this is a bug then I hope this helps figure out how to resolve it. If this is not a bug then please help me understand what I need to do to get the expected value.

junkmd commented 4 months ago

I think this problem is getting to the crux of comtypes.

It relates to the mapping of VT_... and types in comtypes. For years, the community has grappled with the issue of VT_R8 and VT_DATE.

As pointed out in https://github.com/enthought/comtypes/issues/236#issuecomment-1243122782 , there is an inconsistency problem between _ctype_to_vartype and _vartype_to_ctype.

Additionally, the COM library parser that was used to generates Python code treats VT_DATE as a double.

Any or all of these factors may be relevant.

Currently, I am dedicating resources to refactoring, static typing, and enum types in comtypes. If proposed changes from the community are maintain backward compatibility, and include sufficient tests, I welcome them. I can allocate resources for reviewing such changes.

However, please note that issues like this have been deep-rooted challenges.

If you have other ideas, such as helper functions, feel free to share them.

DragonRulerX commented 4 months ago

Hmm, well after reviewing your links, conversation, and related code comments I ended up finding my way to some interesting content that may explain part of the mystery I was facing.

The brief summary seems to be related to the fact that I was using comtypes to extract date information from a "Microsoft" product (Outlook). Apparently, they have a special way of representing dates different from what you'd expect when reading the Python datetime library documentation. The Python datetime library has a method of parsing "Ordinal" numbers which is what these values from the comtypes methods looked like, but they are actually Microsoft's "Serial Date" format. The primary difference between the two is that datetime.fromordinal assumes the first day ever aka "day 1" to be 0001-01-01, but Microsoft's "Serial Date" actually assumes "day 1" to be 1900-01-01. Further, Microsoft purposefully included a bug from a legacy system for compatibility reasons that handled a single leap year incorrectly making it so that 1900-02-29 exists in their "Serial Date" packing when it otherwise should not have. These two factors combined explain why none of my initial efforts to parse this data worked.

However, the longer explanation with references and some of my investigative code might explain why I say this only seemingly explains "part" of the issue.

First some of the reference material I used (note that anything talking about Excel applies to Outlook, apparently):

So, despite all this research so far I still have some fields that seem to have stranger values yet. The following contains very similar code to what I first posted and then attempts to naively assume that any floating point value is likely a Serial Date and attempts to then parse it with a variety of methods (with some messy code to make a nice printout you can ignore). The try_math method is the one aiming to do the expected parsing of a Serial Date and seems to work (note: this does not include 1904 mode yet, but does account for the Lotus 123 bug).

import comtypes
import comtypes.client

# ==============================================================================
o = comtypes.client.CreateObject("Outlook.Application")

# ==============================================================================
def get_collection(countable, reverse=False):
    if not hasattr(countable, "Count"):
        raise AttributeError(f'{countable} does not have attribute "Count"')

    _range = range(countable.Count, 0, -1) if reverse else range(1, countable.Count + 1)

    return (countable.Item(i) for i in _range)

# ==============================================================================
def get_emails():
    if hasattr(get_emails, "cache"):
        return get_emails.cache

    get_emails.cache = list(get_collection(o.GetNamespace("MAPI").Folders))
    return get_emails.cache

# ==============================================================================
def get_folders(email):
    if not hasattr(get_folders, "cache"):
        get_folders.cache = {}

    if (cache := get_folders.cache) and email in cache:
        return cache[email]

    def walk(parent):
        for folder in get_collection(parent.Folders):
            yield folder
            yield from walk(folder)

    cache[email] = list(walk(email))
    return cache[email]

# ==============================================================================
def get_mail(folder):
    def is_valid(item):
        if not isinstance(item, comtypes._compointer_base):
            return False
        elif item.__com_interface__.__name__ != "_MailItem":
            return False
        elif not hasattr(item.Sender, "Address"):
            return False
        elif not item.Subject:
            return False
        return True
    return filter(is_valid, get_collection(folder.Items))

# ==============================================================================
def get_senders():
    interesting_attrs = {}
    for e in get_emails():
        for f in get_folders(e):
            for m in get_mail(f):
                print(m)
                print(type(m))
                for attr in dir(m):
                    try:
                        v = getattr(m, attr)
                        if isinstance(v, float):
                            interesting_attrs[attr] = v
                    except Exception:
                        continue
                return interesting_attrs

# ==============================================================================
def investigate(checks):
    from collections import namedtuple
    from datetime import datetime, date, time, timedelta

    # --------------------------------------------------------------------------
    def try_fromtimestamp(x):
        return date.fromtimestamp(x)

    # --------------------------------------------------------------------------
    def try_fromordinal_date(x):
        return date.fromordinal(int(x))

    # --------------------------------------------------------------------------
    def try_fromordinal_datetime(x):
        return datetime.fromordinal(int(x))

    # --------------------------------------------------------------------------
    def try_utcfromtimestamp(x):
        import datetime
        return datetime.datetime.fromtimestamp(x, datetime.UTC)

    # --------------------------------------------------------------------------
    def try_math(x):
        # ----------------------------------------------------------------------
        def serial_mod(f, m):
            d, rem = divmod(f * m, 1)
            return int(d), rem

        # ----------------------------------------------------------------------
        def parse_serial_date(d):
            if not hasattr(parse_serial_date, "sentinel"):
                parse_serial_date.sentinel = date(year=1899, month=12, day=31)
            return parse_serial_date.sentinel + timedelta(days=d - 1 if d > 59 else d)

        # ----------------------------------------------------------------------
        def parse_serial_time(t):
            h, rem = serial_mod(t, 24)
            m, rem = serial_mod(rem, 60)
            s, rem = serial_mod(rem, 60)
            return time(hour=h, minute=m, second=s, microsecond=int(rem * 10 ** 6))

        # ----------------------------------------------------------------------
        d, t = serial_mod(x, 1)
        return datetime.combine(parse_serial_date(d), parse_serial_time(t))

    # --------------------------------------------------------------------------
    # Ignore the code below here for testing
    # Wasn't going to bother installing tabulate in this venv
    # --------------------------------------------------------------------------
    aw = max(map(len, checks))
    ds, fs = zip(*(map(len, str(t).split('.')) for t in checks.values()))
    dsw = max(ds)
    fsw = max(fs)
    sw = dsw + fsw + 1
    fnw = max(len(name) for name in locals() if name.startswith("try_"))
    dt_fmt = "%Y-%m-%d %H:%M %p"
    tw = len(datetime.now().strftime("%Y-%m-%d %H:%M %p"))

    Row = namedtuple("Row", ("attr", "serial", "fn", "result"))
    fmt = f" {{attr: <{aw}}} | {{serial: >{sw}}} | {{fn: <{fnw}}} | {{result: <{tw}}}"

    def check(attr, t, fn):
        d, f = str(t).split('.')
        try:
            result = fn(t).strftime("%Y-%m-%d %H:%M %p")
        except Exception as e:
            result = f"Error: {e}"

        serial_fmt = f"{{days: >{dsw}}}.{{frac: <{fsw}}}"
        serial = serial_fmt.format(days=d, frac=f)
        print(fmt.format(**Row(attr=attr, serial=serial, fn=fn.__name__, result=result)._asdict()))

    print(fmt.format(**Row(attr='', serial='', fn="Correct Timestamp ----->", result="2023-12-31 08:01 AM")._asdict()).replace('|', ' '))
    print('-' * 100)
    for attr, stamp in sorted(checks.items(), key=lambda d: d[1]):
        for fn in (fn for name, fn in locals().items() if name.startswith("try_")):
            check(attr, stamp, fn)
        print('-' * 100)

# ==============================================================================
def main():
    checks = get_senders()
    investigate(checks)

# ==============================================================================
if __name__ == "__main__":
    main()

Output (times rounded to seconds since Outlook doesn't seem to display anything more granular):

<POINTER(_MailItem) ptr=0x115769a3ee8 at 11577401550>
<class 'comtypes.POINTER(_MailItem)'>
                                                Correct Timestamp ----->   2023-12-31 08:01 AM
----------------------------------------------------------------------------------------------------
 SentOn                  |  45291.33415509259 | try_fromtimestamp        | 1970-01-01 00:00 AM
 SentOn                  |  45291.33415509259 | try_fromordinal_date     | 0125-01-01 00:00 AM
 SentOn                  |  45291.33415509259 | try_fromordinal_datetime | 0125-01-01 00:00 AM
 SentOn                  |  45291.33415509259 | try_utcfromtimestamp     | 1970-01-01 12:34 PM
 SentOn                  |  45291.33415509259 | try_math                 | 2023-12-31 08:01 AM
----------------------------------------------------------------------------------------------------
 ReceivedTime            |  45291.33416666667 | try_fromtimestamp        | 1970-01-01 00:00 AM
 ReceivedTime            |  45291.33416666667 | try_fromordinal_date     | 0125-01-01 00:00 AM
 ReceivedTime            |  45291.33416666667 | try_fromordinal_datetime | 0125-01-01 00:00 AM
 ReceivedTime            |  45291.33416666667 | try_utcfromtimestamp     | 1970-01-01 12:34 PM
 ReceivedTime            |  45291.33416666667 | try_math                 | 2023-12-31 08:01 AM
----------------------------------------------------------------------------------------------------
 CreationTime            |  45293.06147771263 | try_fromtimestamp        | 1970-01-01 00:00 AM
 CreationTime            |  45293.06147771263 | try_fromordinal_date     | 0125-01-03 00:00 AM
 CreationTime            |  45293.06147771263 | try_fromordinal_datetime | 0125-01-03 00:00 AM
 CreationTime            |  45293.06147771263 | try_utcfromtimestamp     | 1970-01-01 12:34 PM
 CreationTime            |  45293.06147771263 | try_math                 | 2024-01-02 01:28 AM
----------------------------------------------------------------------------------------------------
 LastModificationTime    |  45356.353225      | try_fromtimestamp        | 1970-01-01 00:00 AM
 LastModificationTime    |  45356.353225      | try_fromordinal_date     | 0125-03-07 00:00 AM
 LastModificationTime    |  45356.353225      | try_fromordinal_datetime | 0125-03-07 00:00 AM
 LastModificationTime    |  45356.353225      | try_utcfromtimestamp     | 1970-01-01 12:35 PM
 LastModificationTime    |  45356.353225      | try_math                 | 2024-03-05 08:28 AM
----------------------------------------------------------------------------------------------------
 DeferredDeliveryTime    | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 DeferredDeliveryTime    | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 DeferredDeliveryTime    | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 DeferredDeliveryTime    | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 DeferredDeliveryTime    | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 ExpiryTime              | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 ExpiryTime              | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 ExpiryTime              | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 ExpiryTime              | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 ExpiryTime              | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 FlagDueBy               | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 FlagDueBy               | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 FlagDueBy               | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 FlagDueBy               | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 FlagDueBy               | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 ReminderTime            | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 ReminderTime            | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 ReminderTime            | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 ReminderTime            | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 ReminderTime            | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 RetentionExpirationDate | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 RetentionExpirationDate | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 RetentionExpirationDate | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 RetentionExpirationDate | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 RetentionExpirationDate | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 TaskCompletedDate       | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 TaskCompletedDate       | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 TaskCompletedDate       | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 TaskCompletedDate       | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 TaskCompletedDate       | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 TaskDueDate             | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 TaskDueDate             | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 TaskDueDate             | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 TaskDueDate             | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 TaskDueDate             | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 TaskStartDate           | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 TaskStartDate           | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 TaskStartDate           | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 TaskStartDate           | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 TaskStartDate           | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
 ToDoTaskOrdinal         | 949998.0           | try_fromtimestamp        | 1970-01-11 00:00 AM
 ToDoTaskOrdinal         | 949998.0           | try_fromordinal_date     | 2602-01-03 00:00 AM
 ToDoTaskOrdinal         | 949998.0           | try_fromordinal_datetime | 2602-01-03 00:00 AM
 ToDoTaskOrdinal         | 949998.0           | try_utcfromtimestamp     | 1970-01-11 23:53 PM
 ToDoTaskOrdinal         | 949998.0           | try_math                 | 4501-01-01 00:00 AM
----------------------------------------------------------------------------------------------------
[Finished in 1.1s]

You'll notice that the 949998.0 number doesn't seem to reliably parse to what I think most people would generally assume to be reasonable values in any of my current methods. It is unclear to me if it is actually parsing correctly or if it isn't or if the value itself is simply wrong since none of these fields relate to data points I would expect to even have values for given the mail item it is related to (as in this mail item doesn't have a "ReminderTime" for example).

Further, you'll notice CreationTime is slightly different by a few days. Again, I don't know what this field is meant to represent, but it feels like if it is the "time the mail item was created" that it would be the same as "the time the mail item was received". However, I would concede to anyone making a point that it may perhaps be different if the mail item was "received" in the Microsoft email server on 2023-12-31, but my mail client wasn't on that day (PC was off maybe) and thus it wasn't "created" locally until the day I turned on my mail client again on 2024-01-02. That said, I would need to find supporting documentation to back that theory to know if it was a bug or not.

So, anyway - just an update on my findings so far. Still not sure if there is anything actually wrong here or if it is more just being ignorant on my part. However, if it is possible to know a date comes from a Microsoft product it might be nice to offer utility methods to convert from the Serial Date format to a normal datetime object or to just parse to a datetime object by default and either assuming 1904 mode is off or somehow querying if that mode is on from the application itself implicitly. Just some thoughts.

DragonRulerX commented 4 months ago

It appears there is yet more information on the confusion regarding the Serial Date 949998.0 or 4501-01-01:

According to this resource this is expected for when None should be found.

So, I guess this boils down to a misunderstanding of the data source and the way to convert the data to a proper representation. However, I'll leave this open for if we feel it is worth discussing automatic conversions to proper datetime/NoneType objects for users or not.

junkmd commented 4 months ago

Thank you for your thorough investigation.

It seems that possibly to address legacy bugs related to Windows dates, the _com_null_date constant is defined in comtypes.automation. However, it appears to be defined without being used elsewhere.

A similar item, com_null_date64, exists in comtypes._npsupport and is utilized for date-related processing.

Changing the default behavior during type conversion could have unpredictable impacts, so it's not a decision to be taken lightly.

However, creating a utility function to cast from float to datetime or None seems relatively straightforward. You can use the existing implementations as a reference.

Do you have any thoughts or opinions on this matter?

DragonRulerX commented 4 months ago

When it comes to the comtypes library implementation details I must admit that I'm wildly ignorant so I feel any final solution decided should be handled by those in this community more familiar with the library itself.

That said, I think the "ideal" solution would be to recognize when a call like this is made to the module:

o = comtypes.client.CreateObject("Outlook.Application")

Presumably the CreateObject call can set a flag in the module or resulting created object that perpetuates to all other objects derived from calls to o or there exists some other introspective method to identify what parent application the CreateObject was created for. If this is possible then it feels like you could check for any CreateObject call targeting any "Microsoft Application" and force the conversion from float to datetime or None. That does carry risk for existing implementations, but I think the QoL improvement would be worth it. I think as long as an update like that goes out for a "major" version release then most people should be aware to look for breaking changes such as that. I'm sure that doesn't even scratch the surface of some of the other concerns you were implying in your response regarding potential "unpredictable impacts" which is why I think it's best for the community here to handle it since I wouldn't even know where to begin looking to do sanity checking.

However, if a simple utility method can be added to objects using one of the two introspective methods mentioned above to convert from the serial date to a datetime or None object then that would work as a simple patch for a "minor" release. The only downside is that it is not likely going to be readily apparent to any user to call the method for any given object since use of the method is only valid for derivative objects of a "Microsoft Application" that have a field that happens to also be a serial date.

Regardless, I think for both of these we'd query what environment we're in to determine if we need to use 1900 or 1904 mode so that Mac users get accurate dates too. That said, I'm not too familiar with 1904 mode so I don't know if that should be based on the environment the application is running in or if it should be based on the environment the application was in when the data was saved (ex: made spreadsheet on Windows PC [1900 mode?], but opened it on Mac [1904 mode?] or vice versa). So, that could be another interesting discussion point.

junkmd commented 3 months ago

Thanks for sharing your thoughts.

Windows PC [1900 mode?]

Mac [1904 mode?]

They are different matters from this discussion.

COM objects are unique to the Windows environment. https://learn.microsoft.com/en-us/windows/win32/com/component-object-model--com--portal

Therefore, Mac users are out of scope for this project.

junkmd commented 3 months ago

check for any CreateObject call targeting any "Microsoft Application" and force the conversion from float to datetime or None

I cannot agree with adding this implementation.

COM object libraries are assigned GUIDs, so comtypes may be able to know if the interface of the generated object belongs to the Microsoft Outlook 16.0 Object Library. However, I understand that there is no place in the COM library information that holds the information that "this date serial should be treated as null/None". As you also mentioned, Excel has a famous problem with treating the year 1900 as a leap year, but I understand that the information about that date is not anywhere in the Excel COM library itself.

In other words, to map "this COM library's null date is that date", comtypes should have a mapping of library GUIDs and date serials. There should be an infinite number of COM libraries in the world. Therefore, the null date mapping could potentially grow indefinitely. Maintaining a bloated mapping would be quite a painful task.

I believe that comtypes itself should not deal with special specifications that exist in a particular COM library.

For your project, I think you can create a wrapper like the one below that encapsulates and hides the tricky date calculation.

from dataclasses import dataclass
from datetime import datetime, date, timedelta, time

import comtypes.client
comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',))

from comtypes.gen import Outlook

@dataclass
class MailItemWrapper:
    _item: Outlook._MailItem

    @property
    def recieved_time(self) -> datetime | None:
        def try_math(x: float):
            # ----------------------------------------------------------------------
            def serial_mod(f: float, m: int) -> tuple[int, float]:
                d, rem = divmod(f * m, 1)
                return int(d), rem

            # ----------------------------------------------------------------------
            def parse_serial_date(d: int) -> date:
                if not hasattr(parse_serial_date, "sentinel"):
                    parse_serial_date.sentinel = date(year=1899, month=12, day=31)
                return parse_serial_date.sentinel + timedelta(days=d - 1 if d > 59 else d)

            # ----------------------------------------------------------------------
            def parse_serial_time(t: float) -> time:
                h, rem = serial_mod(t, 24)
                m, rem = serial_mod(rem, 60)
                s, rem = serial_mod(rem, 60)
                return time(hour=h, minute=m, second=s, microsecond=int(rem * 10 ** 6))

            # ----------------------------------------------------------------------
            d, t = serial_mod(x, 1)
            return datetime.combine(parse_serial_date(d), parse_serial_time(t))
        t = try_math(self._item.ReceivedTime)
        if t == datetime(4501, 1, 1, 0, 0):
            return None
        return t

    @property
    def sender_name(self) -> str:
        return self._item.SenderName

    # ... and other properties and methods

I hope this helps.

DragonRulerX commented 3 months ago

Therefore, Mac users are out of scope for this project.

I guess the only reason I brought this up was due to not knowing precisely how the mode is determined. If the mode was determined based on the file's originating OS then it may be relevant to consider even in a Windows centric library. That said, I'm still likely very ignorant on all this so my assumptions may be wrong.

In other words, to map "this COM library's null date is that date", comtypes should have a mapping of library GUIDs and date serials. There should be an infinite number of COM libraries in the world. Therefore, the null date mapping could potentially grow indefinitely. Maintaining a bloated mapping would be quite a painful task.

I'm not very familiar with COM things at all so the prevalence of it in the world is not something I can attest to. That said, I don't believe you'd need an extensive mapping framework. I could be completely wrong, but I'd assume the number of applications with oddities such as what I've now encountered with the Microsoft Applications are actually quite minimal. Further, I'd imagine a grouping of mappings could be used to minimize bloat.

Perhaps this idea is too naive, but here's a bit of untested code to convey my idea anyway:

from dataclasses import dataclass
from datetime import datetime

@dataclass
class SerialNullCorrection:
    name: str # or maybe make this a tuple if needed for multiple application suites
    guids: tuple[str]
    converter: callable
    null_date: date

corrections = (
    SerialNullCorrection(
        name="Microsoft",
        guids={"...", },
        converter=try_math,
        null_date=datetime(4501, 1, 1)
    ),
)

def get_corrector(guid):
    return next((c for c in corrections if guid in c.guids), None)

def correct_serial_date(guid, x):
    if not (corrector := get_corrector(guid)):
        return x

    if not isinstance(x, float):
        raise TypeError

    x = corrector.converter(x)
    return None if x == corrector.null_date else x

However, like I said earlier - my knowledge on COM things and this library is minimal so if you feel there's no good way to handle stuff like this in a general or at least collective sense then I completely understand. My only goal was to see if there was a way to add some QoL behaviors or methods to the library natively to help naive devs not need to know abstract/niche details about the environment they'll be working in produced by the dynamic nature of the COM interface with their target application.

That said this code you wrote intrigues me:

import comtypes.client
comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',))

from comtypes.gen import Outlook

I'm not familiar with this approach. Is there a difference in behavior relative to my approach besides o vs Outlook ? I would like to make sure I'm using best practice in my code where possible.

import comtypes.client
o = comtypes.client.CreateObject("Outlook.Application")
junkmd commented 3 months ago

Because anyone can create a COM library, theoretically, they exist in infinite numbers. For example, the method for creating a COM interface in a C++ project is published in Microsoft's documentation. https://learn.microsoft.com/en-us/windows/win32/learnwin32/creating-an-object-in-com

We do not know how many COM libraries will use the null date in the future, and there is a possibility that the codebase will be inexhaustibly bloated, so I cannot agree that comtypes hold detailed information about each COM library.

I think it is the responsibility of the provider of the application to document when the null date of that application is. And how to handle that information should be controlled by the user, just as the workaround for Excel's date calculation is different for each user.

While the purpose of comtypes is to define and operate COM interfaces in Python, I believe it should not be responsible for the specifications of individual applications.

junkmd commented 3 months ago

I will explain the code for the MailItemWrapper dataclass that I presented.

In practice, you would pass an instance of _MailItem to the dataclass constructor as shown below.

from dataclasses import dataclass

import comtypes.client
comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',))

from comtypes.gen import Outlook

@dataclass
class MailItemWrapper:
    _item: Outlook._MailItem

    ...

def main():
    o = comtypes.client.CreateObject(Outlook.Application, interface=Outlook._Application)
    ns: Outlook._NameSpace = o.GetNamespace("MAPI")
    fld: Outlook.MAPIFolder = ns.GetDefaultFolder(Outlook.olFolderInbox)
    for item in fld.Items:
        if not isinstance(item, Outlook._MailItem):
            raise TypeError
        print(MailItemWrapper(item))  # HERE!
    o.Quit()

if __name__ == "__main__":
    main()

comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',)) creates the Python wrapper module for the Outlook COM library by passing the GUID to GetModule. Since the return value of this function is a module, the runtime behavior does not change whether you assign this return value to a variable and reference it or explicitly import the module. I prefer this way of writing because we can benefit from Python's static type system when we explicitly import.

CreateObject internally calls GetModule and statically defines the interface of the object to be returned. The objects created by comtypes.client.CreateObject("Outlook.Application") and comtypes.client.CreateObject(Outlook.Application, interface=Outlook._Application) both return an Outlook._Application instance. The latter is simply making explicit what the former implicitly does with VersionIndependentProgID like "Outlook.Application" or "Excel.Application".

After that, it's just iterating over the emails in the Inbox. The runtime benefit of importing explicitly the module is that we can determine the type with isinstance without using __com_interface__.__name__.

DragonRulerX commented 3 months ago

Because anyone can create a COM library, theoretically, they exist in infinite numbers.

Ah, I see. That makes sense. So, it'll have to be on the dev to know the quirks of their COM application interface. A shame, but understandable I suppose.

comtypes.client.GetModule(('{00062FFF-0000-0000-C000-000000000046}',)) creates the Python wrapper module for the Outlook COM library by passing the GUID to GetModule. ... The runtime benefit of importing explicitly the module is that we can determine the type with isinstance without using __cominterface__.__name_\.

I do really like the ability to leverage isinstance and to do proper type hinting in this context, but the use of the GUID raises a few more questions if you have the time.

junkmd commented 3 months ago

Please refer to the MS documentation for information about COM and GUID. https://learn.microsoft.com/en-us/windows/win32/com/com-technical-overview

We can pass various objects to GetModule. Some of them are strings like "scrrun.dll" and "UIAutomationCore.dll", and internally replace them with the appropriate COM library full path. However, Outlook and Excel do not work in this way, and we need to use a different way. Of course, we can create a Python module by passing the full path of the .exe file, but it is not recommended because it depends on how the user installed Office.

Instead, I recommend to use GUIDs, but using them as they are will lose readability as you pointed out. In such cases, I solve the problem of readability by passing the GUID after making it a constant.

OUTLOOK_TYPELIB_GUID = '{00062FFF-0000-0000-C000-000000000046}'

comtypes.client.GetModule((OUTLOOK_TYPELIB_GUID,))
junkmd commented 3 months ago

Is there an update on this issue?

DragonRulerX commented 2 months ago

Apologies, I haven't been able to find much time for work on the project recently so I don't have much for additional questions at the moment. This can probably be closed out unless we feel there is a way to provide some extension to the comtypes library in the form of utility methods, wrappers, or documentation. Based on our discussion it sounded like those options may not be possible, ideal, or are simply naive.

junkmd commented 2 months ago

Based on our discussion it sounded like those options may not be possible, ideal, or are simply naive.

In this case, it has been determined that a special branch configuration for the specifications of a specific application, which is outside the responsibility of comtypes, is necessary.

However, when there are inconsistencies in the values handled by the COM library, it is difficult to determine whether they are due to the specifications of the application, or a bug in the application/Python/comtypes. Therefore, having someone like you who actively investigates on their own is important for the development of this package. Thank you.

Also, from the issues posted recently, I have come to understand that adding more detailed type hints to the method returns would be beneficial. I am considering implementing this in comtypes==1.4.3 (though it may only be for some methods and types).