ecederstrand / exchangelib

Python client for Microsoft Exchange Web Services (EWS)
BSD 2-Clause "Simplified" License
1.17k stars 248 forks source link

Cannot convert value _start_timezone _end_timezone #541

Closed dastious closed 5 years ago

dastious commented 5 years ago

I do this to get all email and then calendar, with email it's seems to work well. But if i list all calendar item like this :

print("Processing : Get Item List, calendar")
list_sockage_calendar = []
for item in account_source.calendar.all().order_by('datetime_received'):
    list_sockage_calendar.append(tuple(item.item_id))

The script crash with this message :

> Cannot convert value '(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne' on field '_start_timezone' to type <class 'exchangelib.ewsdatetime.EWSTimeZone'> (unknown timezone ID)
> Cannot convert value '(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne' on field '_end_timezone' to type <class 'exchangelib.ewsdatetime.EWSTimeZone'> (unknown timezone ID)

I tested without order_by('datetime_received') it produces the same kind of error but in a loop (not just 2 errors) :

> Cannot convert value 'Customized Time Zone' on field '_end_timezone' to type <class 'exchangelib.ewsdatetime.EWSTimeZone'> (unknown timezone ID)
> Cannot convert value 'Customized Time Zone' on field '_start_timezone' to type <class 'exchangelib.ewsdatetime.EWSTimeZone'> (unknown timezone ID)

I tested with the master branch same thing.

ecederstrand commented 5 years ago

Can you please post the part of the response XML containing the string (GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne? See https://github.com/ecederstrand/exchangelib#troubleshooting

dastious commented 5 years ago

in fact the real error is not the warning :

File "test.py", line 50, in for item in account_source.calendar.all().order_by('datetime_received'): File "/home/migrator/calmigrator/venv/lib/python3.5/site-packages/exchangelib/queryset.py", line 311, in iter for val in self._format_items(items=self._query(), return_format=self.return_format): File "/home/migrator/calmigrator/venv/lib/python3.5/site-packages/exchangelib/queryset.py", line 390, in _as_items for i in iterable: File "/home/migrator/calmigrator/venv/lib/python3.5/site-packages/exchangelib/account.py", line 629, in fetch shape=IdOnly, File "/home/migrator/calmigrator/venv/lib/python3.5/site-packages/exchangelib/services.py", line 597, in _pool_requests for elem in r.get(): File "/home/migrator/calmigrator/venv/lib/python3.5/site-packages/exchangelib/services.py", line 330, in _get_elements_in_response container_or_exc = self._get_element_container(message=msg, name=self.element_container_name) File "/home/migrator/calmigrator/venv/lib/python3.5/site-packages/exchangelib/services.py", line 303, in _get_element_container raise self._get_exception(code=response_code, text=msg_text, msg_xml=msg_xml) exchangelib.errors.ErrorTimeoutExpired: The request timed out.

It seems that this Timeout error is only happenning with calendar not with all other folder and things.

Is it ok if I only show you this part of the xml ? Since this calendar have private information. Well I can make a fake calendar if you really need to full xml. Tell me.

<t:Start>2010-06-23T00:00:00+02:00</t:Start>
<t:End>2010-06-24T00:00:00+02:00</t:End>
<t:OriginalStart>2010-06-23T00:00:00+02:00</t:OriginalStart>
</t:LastOccurrence>
<t:StartTimeZone
    Name="(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne"
    Id="(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne">
<t:Periods>
    <t:Period Bias="-PT1H" Name="Standard"
              Id="trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne/1601-Standard"/>
    <t:Period Bias="-PT2H" Name="Daylight"
              Id="trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne/1601-Daylight"/>
</t:Periods>
<t:TransitionsGroups>
    <t:TransitionsGroup Id="0">
        <t:RecurringDayTransition>
            <t:To Kind="Period">trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm,
                Vienne/1601-Daylight
            </t:To>
            <t:TimeOffset>PT3H</t:TimeOffset>
            <t:Month>3</t:Month>
            <t:DayOfWeek>Sunday</t:DayOfWeek>
            <t:Occurrence>-1</t:Occurrence>
        </t:RecurringDayTransition>
        <t:RecurringDayTransition>
            <t:To Kind="Period">trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm,
                Vienne/1601-Standard
            </t:To>
            <t:TimeOffset>PT2H</t:TimeOffset>
            <t:Month>10</t:Month>
            <t:DayOfWeek>Sunday</t:DayOfWeek>
            <t:Occurrence>-1</t:Occurrence>
        </t:RecurringDayTransition>
    </t:TransitionsGroup>
</t:TransitionsGroups>
<t:Transitions>
    <t:Transition>
        <t:To Kind="Group">0</t:To>
    </t:Transition>
</t:Transitions>
</t:StartTimeZone>
<t:EndTimeZone Name="" Id="(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne">
<t:Periods>
    <t:Period Bias="-PT1H" Name="Standard"
              Id="trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne/1601-Standard"/>
    <t:Period Bias="-PT2H" Name="Daylight"
              Id="trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne/1601-Daylight"/>
</t:Periods>
<t:TransitionsGroups>
    <t:TransitionsGroup Id="0">
        <t:RecurringDayTransition>
            <t:To Kind="Period">trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm,
                Vienne/1601-Daylight
            </t:To>
            <t:TimeOffset>PT3H</t:TimeOffset>
            <t:Month>3</t:Month>
            <t:DayOfWeek>Sunday</t:DayOfWeek>
            <t:Occurrence>-1</t:Occurrence>
        </t:RecurringDayTransition>
        <t:RecurringDayTransition>
            <t:To Kind="Period">trule:Microsoft/Registry/(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm,
                Vienne/1601-Standard
            </t:To>
            <t:TimeOffset>PT2H</t:TimeOffset>
            <t:Month>10</t:Month>
            <t:DayOfWeek>Sunday</t:DayOfWeek>
            <t:Occurrence>-1</t:Occurrence>
        </t:RecurringDayTransition>
    </t:TransitionsGroup>
</t:TransitionsGroups>
<t:Transitions>
    <t:Transition>
        <t:To Kind="Group">0</t:To>
    </t:Transition>
</t:Transitions>
</t:EndTimeZone>
<t:ConferenceType>0</t:ConferenceType>
<t:AllowNewTimeProposal>true</t:AllowNewTimeProposal>
<t:IsOnlineMeeting>false</t:IsOnlineMeeting>
<t:NetShowUrl/>
ecederstrand commented 5 years ago

The timezone IDs are really weird. They are not standard Microsoft or CLDR timezone values. You can add a translation to a reasonable pytz timezone ID to exchangelib.winzone.MS_TIMEZONE_TO_PYTZ_MAP at the top of your script.

Timeouts occur because your server is too loaded. You have a couple of options:

  1. Decrease the number of items in a paged request. See https://github.com/ecederstrand/exchangelib#paging
  2. Limit the number of fields you fetch to only those you need with e.g account.calendar.only('start', 'end', 'subject'). See https://github.com/ecederstrand/exchangelib#searching
  3. Use ServiceAccount to hide the timeout if they occur irregularly. See https://github.com/ecederstrand/exchangelib#setup-and-connecting
dastious commented 5 years ago

Just to know that I'm already using ServiceAccount so that's why i tough this is not a normal behavior. But i will try with decreasing the number of items in a paged request.

(also the timezone come from exchange 2004 migrated to ex 2010 migrated to ex 2016... that's probably why there a weird)

Thanks a lot for the hints !

ecederstrand commented 5 years ago

ErrorTimeoutExpired is the server telling us to step back a bit because it's too busy. Newer versions of exchangelib try to back off gracefully, decrease the connection pool size etc, but we may still re-throw ErrorTimeoutExpired if we could not scale back further, or if we got too impatient.

ecederstrand commented 5 years ago

Closing because I don't think there's anything to change in exchangelib. Feel free to re-open if you think otherwise.

monperrus commented 1 year ago

Hi, hitting the timezone error today

Cannot convert value '(UTC+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna' on field '_start_timezone' to type <class 'exchangelib.ewsdatetime.EWSTimeZone'> (unknown timezone ID)

with exchangelib 4.9.0

any thought? thanks!

ecederstrand commented 1 year ago

The suggestion is to add a custom entry for this time zone definition in exchangelib.winzone.MS_TIMEZONE_TO_IANA_MAP

monperrus commented 1 year ago

Thanks for the fast answer @ecederstrand . Could it be that there is something wrong on the server side? Should I reach out to the Exchange server admin?

ecederstrand commented 1 year ago

It's my understanding that this is a config issue on the Exchange / Windows server. The timezone ID should be e.g. W. Europe Standard Time. Alternatively, we'll have to find out how the (GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne timezone ID came to be before we can discuss a possible solution for exchangelib.

monperrus commented 1 year ago

I found it: the TZID in calendar entries comes from each individual calendar client or the sender of the calendar invitations, and not from the server. It's not standardized or canonicalized.

This particular one is that of the Microsoft OWA rich web client. The list of TZIDs from OWA is here https://gist.github.com/monperrus/12f852e9b629e7028494ffc92da52aeb

ecederstrand commented 1 year ago

Thanks for the list! Where did you get it from? If we add this to exchangelib, I'd like a reference to official Microsoft docs. I've found this so far: https://learn.microsoft.com/en-us/windows-hardware/manufacture/desktop/default-time-zones?view=windows-11 but it would be nice with something we can download and process automatically to test that whatever we import into exchangelib is up-to-date.

Also, that's really depressing behavior. I've seen my share of timezone craziness over the years. Allowing any timezone input without validation is a whole new level of crazy.

ecederstrand commented 1 year ago

It gets even worse. The timezone ID (GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne that the OP saw is not the same as the (UTC+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna mentioned in the docs I just posted. I assume e.g. the Berne -> Bern difference is due to localization. We can't possibly bring in the English version plus all localized versions of Microsoft's timezone names.

At this point, I don't have any good solutions on how to reliably translate the posted timezone definitions to official IANA timezones which is what the Python timezone library expects. We could try to somehow use the TransitionsGroups to match an IANA timezone, but the TransitionsGroups definition delivered in the XML is by no means complete compared to the full set of transitions that e.g. Europe/Amsterdam defines.

monperrus commented 1 year ago

I got the list by reverse-engineering the OWA JS code.

We can't possibly bring in the English version plus all localized versions of Microsoft's timezone names.

I agree.

monperrus commented 1 year ago

I see that in those cases, the TZID is just an identifier between a VEVENT and a VTIMEZONE. So TZID can be any arbitrary string.

What about matching on the offset from the VTIMEZONE instead of the TZID if this is a reference to a VTIMEZONE?

ecederstrand commented 1 year ago

Unfortunately, the offset is not sufficient to match with a timezone. There may be multiple timezones within a single offset with varying transitions between standard time and daylight saving time, and the offset will vary depending on whether the timestamp is within DST or not.

The more assumptions we make trying to match arbitrary timezone labels with actual IANA timezones, the more difficult-to-debug issues we create when those assumptions are wrong.

monperrus commented 1 year ago

The more assumptions we make trying to match arbitrary timezone labels with actual IANA timezones, the more difficult-to-debug issues

agree

Unfortunately, the offset is not sufficient to match with a timezone.

we can do that as best effort, only if the TZID is not known.

it would be strictly backward compatible and better than the current error Cannot convert value

ecederstrand commented 1 year ago

While I appreciate the suggestions, I really don't like guessing. Most people don't check warnings, and I prefer errors over difficult-to debug issues.

As an example, Iceland is GMT+0 and does not observe DST. Britain is also GMT+0 but does observe DST. If we were to guess Britain but the timezone was in fact Iceland, then we would get the time wrong if we move a meeting from September to November.

monperrus commented 1 year ago

As an example, Iceland is GMT+0 and does not observe DST. Britain is also GMT+0 but does observe DST. If we were to guess Britain but the timezone was in fact Iceland, then we would get the time wrong if we move a meeting from September to November.

I see. That's probably the root cause of quite many subtle calendar bugs that we see on the Internet (and that I've experienced), resulting from a combination of wrong timezone identification combined with event movements over DST weekends.

I'm reading RFC 5545, "Internet Calendaring and Scheduling Core Object Specification (iCalendar)" and there is no obvious solution. It simply says "This document does not define a naming convention for time zone identifiers".

So coming back to a good timezone identification algorithm, here is a new proposal:

WDYT?

ecederstrand commented 1 year ago

If you want to match on offset and DST, then for 90% of the places in the world where people actually live and create meetings in Exchange, there's going to be more than one candidate. GMT+1 will probably have at least 10.

That leaves us with guessing based on some sort of fuzzy string matching. Which means we'll often be right and sometimes wrong. Which I think still leaves exchangelib users in a worse situation than getting an error. I would prefer to simply improve the error message so users know exactly what to do if they get this error. After all, the solution is really simple:

from exchangelib.winzone import MS_TIMEZONE_TO_IANA_MAP as MS_TO_IANA

MS_TO_IANA["(GMT+01:00) Amsterdam, Berlin, Berne, Rome, Stockholm, Vienne"] = "Europe/Amsterdam"

# Your code here

Unless you have an unusually diverse set of accounts, the amount of custom entries will be low. You can even choose to catch the UnknownTimeZone exception and do your own fuzzy string matching to populate MS_TIMEZONE_TO_IANA_MAP automatically.

monperrus commented 1 year ago

Which means we'll often be right and sometimes wrong

Agree

Unless you have an unusually diverse set of accounts, the amount of custom entries will be low.

Not really, because one does not control this field, it is set by the calendar client of all people sending invitations, so it's open-ended.

Maybe the algorithm above could be part of another library to be created: icaltz: deduce timezones from icalendar VTIMEZONE.

monperrus commented 1 year ago

Reading more about timezone inference, the deduction algorithm would also have to take into account fields TZNAME and X-LIC-LOCATION.

Now searching for an existing implementation of a similar algorithm, not finding any in https://github.com/libical/libical/