ecederstrand / exchangelib

Python client for Microsoft Exchange Web Services (EWS)
BSD 2-Clause "Simplified" License
1.18k stars 248 forks source link

The request timed out #509

Closed cyberion1985 closed 6 years ago

cyberion1985 commented 6 years ago

I receive following error. It has been working for a while and now it suddenly has this error again. I am sifting through about 1000 emails every time. So it works "mostly" . I also have increased the TIMEOUT value. I run the same emails again and again in the same folder with :

for item in folder.all():

Python 2.7 64-bit Windows Server 2016 Standard

`Traceback (most recent call last):

File "", line 1, in runfile('C:/scripts/temp7.py', wdir='C:/scripts')

File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile execfile(filename, namespace)

File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 86, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc)

File "C:/scripts/temp7.py", line 148, in for item in incidents.all():

File "C:\ProgramData\Anaconda2\lib\site-packages\exchangelib\queryset.py", line 311, in iter for val in self._format_items(items=self._query(), return_format=self.return_format):

File "C:\ProgramData\Anaconda2\lib\site-packages\exchangelib\queryset.py", line 390, in _as_items for i in iterable:

File "C:\ProgramData\Anaconda2\lib\site-packages\exchangelib\account.py", line 629, in fetch shape=IdOnly,

File "C:\ProgramData\Anaconda2\lib\site-packages\exchangelib\services.py", line 608, in _pool_requests for elem in elems:

File "C:\ProgramData\Anaconda2\lib\site-packages\exchangelib\services.py", line 330, in _get_elements_in_response container_or_exc = self._get_element_container(message=msg, name=self.element_container_name)

File "C:\ProgramData\Anaconda2\lib\site-packages\exchangelib\services.py", line 303, in _get_element_container raise self._get_exception(code=response_code, text=msg_text, msg_xml=msg_xml)

ErrorTimeoutExpired: The request timed out.

`

ecederstrand commented 6 years ago

This is almost certainly a server issue. It is failing to respond to the HTTP request within the timeout you specified.

You can try changing the Credentials object to a ServiceAccount if you want to gloss over these exceptions. A ServiceAccount will enable certain retry policies which may let you continue at a slower rate when your server is misbehaving.

cyberion1985 commented 6 years ago

I tried that but it gave exactly the same error. My concern is that nothing changed on the server and this was all working fine this morning. Now I am basically trying to iterate through these 1000 mails but it never gets there - just simple counter to see how many mails it reads.

Does it read through all emails first when it uses ".all()" and then continue with the code ? Or does it go email per email immediately ?

ecederstrand commented 6 years ago

Ok. I mistook this for an exception from the requests package. Actually, it's an error from the server telling you that you are being throttled, or that the server is hard at work. See https://docs.microsoft.com/en-us/exchange/client-developer/exchange-web-services/handling-synchronization-related-errors-in-ews-in-exchange and https://docs.microsoft.com/en-us/exchange/client-developer/web-service-reference/responsecode

Try either lowering the connection count (BaseProtocol.SESSION_POOLSIZE) or the page size of your batch operations (QuerySet.page_size).

It seems from the stack trace that your query is unnecessarily expensive. If you just want to count the number of items in a folder, you can use the some_folder.total_count or some_folder.unread_count attributes. If you want to get the item count of a filtered list of items, do some_folder.filter(subject='foo').count() .

Iterating over .all() consumes a generator, so items will be fetched from the server and returned according to the defined page size (currently 100 by default).

cyberion1985 commented 6 years ago

Thank you, I will look into this.

cyberion1985 commented 6 years ago

Hi again,

I have deducted that the server just is too busy to process requests. In the morning when server system resources are not that utilized, it works perfectly and quickly. At peak times, it stops working.

The only reason I need this is for gathering data. Once I have the data I need, I don't need to work with 1000 emails anymore, but only with 1 email . So then it will work, as the generator doesn't get over utilized.

ecederstrand commented 6 years ago

Ok. You may be able to reduce the pressure of your query by limiting the fields you fetch from the server to the ones you actually need. For example, mime_content may be huge if your emails contain attachments.

for item in folder.all().only('subject', 'datetime_received', 'sender'):
    # Do something
cyberion1985 commented 6 years ago

@ecederstrand that sounds amazing and will help me a lot ! Thank you

NixBiks commented 6 years ago

I have the exact same issue but I want to extract most features in __dict__ from the messages so I can't limit myself too much.

Are there no requests to extract the messages one by one instead of extracting all by a single request?

Just to give some context; I want to scrape the emails from folders and save to some format like json.

ecederstrand commented 6 years ago

That’s possible. Have a look at the page_size options in the README.

NixBiks commented 6 years ago

I still get a timeout even if I set CHUNK_SIZE=1. But if I set a large CHUNK_SIZE then it doesn't timeout but then I run into memory problems at some point.

I would actually be able to just use .only() but if I include text_body then I get an error since I iterate over all items and not just messages (although I only want to iterate over message). Want me to post this elsewhere?

ValueError: TextField(name='text_body', value_cls=<class 'str'>, is_list=False, is_complex=True, default=None) is not a valid field on (<class 'exchangelib.items.CalendarItem'>, <class 'exchangelib.items.Contact'>, <class 'exchangelib.items.DistributionList'>, <class 'exchangelib.items.Message'>, <class 'exchangelib.items.PostItem'>, <class 'exchangelib.items.Task'>, <class 'exchangelib.items.MeetingRequest'>, <class 'exchangelib.items.MeetingResponse'>, <class 'exchangelib.items.MeetingCancellation'>)

My current workaround is to exclude text_body and use BeautifulSoup to parse html into text.

ecederstrand commented 6 years ago

If you want to see what's going on with the timeouts, then try adding debug logging. See https://github.com/ecederstrand/exchangelib#troubleshooting That should give you an idea about what's happening.

The ValueError is probably a bug. Please post the full stack trace so we can see where it's being raised.

NixBiks commented 6 years ago

Stack trace for the text_body bug.

Traceback (most recent call last):
  File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-5f843eb82f37>", line 10, in <module>
    datetime.datetime.now().strftime('%Y%m%d%H%M'), 'prod' if prod else 'test'))
  File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\ews\__init__.py", line 259, in scrape_mailboxes
    for item in folder.all().only(*keep_only):
  File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\queryset.py", line 293, in __iter__
    for val in self._format_items(items=self._query(), return_format=self.return_format):
  File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\queryset.py", line 365, in _item_yielder
    for i in iterable:
  File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\account.py", line 583, in fetch
    additional_fields = validation_folder.validate_fields(fields=only_fields)
  File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\folders.py", line 556, in validate_fields
    raise ValueError("%r is not a valid field on %s" % (field_path.field, self.supported_item_models))
ValueError: TextField(name='text_body', value_cls=<class 'str'>, is_list=False, is_complex=True, default=None) is not a valid field on (<class 'exchangelib.items.CalendarItem'>, <class 'exchangelib.items.Contact'>, <class 'exchangelib.items.DistributionList'>, <class 'exchangelib.items.Message'>, <class 'exchangelib.items.PostItem'>, <class 'exchangelib.items.Task'>, <class 'exchangelib.items.MeetingRequest'>, <class 'exchangelib.items.MeetingResponse'>, <class 'exchangelib.items.MeetingCancellation'>)
ecederstrand commented 6 years ago

Which version of Exchange is this? You can check with account.protocol.version. If it's older than Exchange 2013, then text_body is not supported, which would explain the error. See https://github.com/ecederstrand/exchangelib/blob/master/exchangelib/items.py#L148

NixBiks commented 6 years ago

Ah sorry. My mistake. It is Outlook 2010

ecederstrand commented 6 years ago

Ok. I think we could at least improve the error message to state that the field is supported, just not on the version you are connected to.

ecederstrand commented 6 years ago

After the closing commit, we now print a more helpful error message:

>>> from exchangelib.version import EXCHANGE_2010
>>> from exchangelib.items import Item
>>> from exchangelib.version import Version
>>> Item.validate_field(field='text_body', version=Version(build=EXCHANGE_2010))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nfs/home/ekc/repos/office365-notifier/exchangelib/exchangelib/properties.py", line 215, in validate_field
    % (field.name, version, field.supported_from, field.deprecated_from))
exchangelib.properties.InvalidFieldForVersion: Field 'text_body' is not supported on server version Build=14.0.0.0, API=Exchange2010, Fullname=Microsoft Exchange Server 2010 (supported from: 15.0.0.0, deprecated from: None)