Closed cyberion1985 closed 6 years ago
This is almost certainly a server issue. It is failing to respond to the HTTP request within the timeout you specified.
You can try changing the Credentials
object to a ServiceAccount
if you want to gloss over these exceptions. A ServiceAccount
will enable certain retry policies which may let you continue at a slower rate when your server is misbehaving.
I tried that but it gave exactly the same error. My concern is that nothing changed on the server and this was all working fine this morning. Now I am basically trying to iterate through these 1000 mails but it never gets there - just simple counter to see how many mails it reads.
Does it read through all emails first when it uses ".all()" and then continue with the code ? Or does it go email per email immediately ?
Ok. I mistook this for an exception from the requests
package. Actually, it's an error from the server telling you that you are being throttled, or that the server is hard at work. See https://docs.microsoft.com/en-us/exchange/client-developer/exchange-web-services/handling-synchronization-related-errors-in-ews-in-exchange and https://docs.microsoft.com/en-us/exchange/client-developer/web-service-reference/responsecode
Try either lowering the connection count (BaseProtocol.SESSION_POOLSIZE
) or the page size of your batch operations (QuerySet.page_size
).
It seems from the stack trace that your query is unnecessarily expensive. If you just want to count the number of items in a folder, you can use the some_folder.total_count
or some_folder.unread_count
attributes. If you want to get the item count of a filtered list of items, do some_folder.filter(subject='foo').count()
.
Iterating over .all()
consumes a generator, so items will be fetched from the server and returned according to the defined page size (currently 100 by default).
Thank you, I will look into this.
Hi again,
I have deducted that the server just is too busy to process requests. In the morning when server system resources are not that utilized, it works perfectly and quickly. At peak times, it stops working.
The only reason I need this is for gathering data. Once I have the data I need, I don't need to work with 1000 emails anymore, but only with 1 email . So then it will work, as the generator doesn't get over utilized.
Ok. You may be able to reduce the pressure of your query by limiting the fields you fetch from the server to the ones you actually need. For example, mime_content
may be huge if your emails contain attachments.
for item in folder.all().only('subject', 'datetime_received', 'sender'):
# Do something
@ecederstrand that sounds amazing and will help me a lot ! Thank you
I have the exact same issue but I want to extract most features in __dict__
from the messages so I can't limit myself too much.
Are there no requests to extract the messages one by one instead of extracting all by a single request?
Just to give some context; I want to scrape the emails from folders and save to some format like json
.
That’s possible. Have a look at the page_size
options in the README.
I still get a timeout even if I set CHUNK_SIZE=1
. But if I set a large CHUNK_SIZE
then it doesn't timeout but then I run into memory problems at some point.
I would actually be able to just use .only()
but if I include text_body
then I get an error since I iterate over all items and not just messages (although I only want to iterate over message). Want me to post this elsewhere?
ValueError: TextField(name='text_body', value_cls=<class 'str'>, is_list=False, is_complex=True, default=None) is not a valid field on (<class 'exchangelib.items.CalendarItem'>, <class 'exchangelib.items.Contact'>, <class 'exchangelib.items.DistributionList'>, <class 'exchangelib.items.Message'>, <class 'exchangelib.items.PostItem'>, <class 'exchangelib.items.Task'>, <class 'exchangelib.items.MeetingRequest'>, <class 'exchangelib.items.MeetingResponse'>, <class 'exchangelib.items.MeetingCancellation'>)
My current workaround is to exclude text_body
and use BeautifulSoup
to parse html into text.
If you want to see what's going on with the timeouts, then try adding debug logging. See https://github.com/ecederstrand/exchangelib#troubleshooting That should give you an idea about what's happening.
The ValueError
is probably a bug. Please post the full stack trace so we can see where it's being raised.
Stack trace for the text_body
bug.
Traceback (most recent call last):
File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-9-5f843eb82f37>", line 10, in <module>
datetime.datetime.now().strftime('%Y%m%d%H%M'), 'prod' if prod else 'test'))
File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\ews\__init__.py", line 259, in scrape_mailboxes
for item in folder.all().only(*keep_only):
File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\queryset.py", line 293, in __iter__
for val in self._format_items(items=self._query(), return_format=self.return_format):
File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\queryset.py", line 365, in _item_yielder
for i in iterable:
File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\account.py", line 583, in fetch
additional_fields = validation_folder.validate_fields(fields=only_fields)
File "C:\Users\X007680\PycharmProjects\ExchangeWebServices\venv\lib\site-packages\exchangelib\folders.py", line 556, in validate_fields
raise ValueError("%r is not a valid field on %s" % (field_path.field, self.supported_item_models))
ValueError: TextField(name='text_body', value_cls=<class 'str'>, is_list=False, is_complex=True, default=None) is not a valid field on (<class 'exchangelib.items.CalendarItem'>, <class 'exchangelib.items.Contact'>, <class 'exchangelib.items.DistributionList'>, <class 'exchangelib.items.Message'>, <class 'exchangelib.items.PostItem'>, <class 'exchangelib.items.Task'>, <class 'exchangelib.items.MeetingRequest'>, <class 'exchangelib.items.MeetingResponse'>, <class 'exchangelib.items.MeetingCancellation'>)
Which version of Exchange is this? You can check with account.protocol.version
. If it's older than Exchange 2013, then text_body
is not supported, which would explain the error. See https://github.com/ecederstrand/exchangelib/blob/master/exchangelib/items.py#L148
Ah sorry. My mistake. It is Outlook 2010
Ok. I think we could at least improve the error message to state that the field is supported, just not on the version you are connected to.
After the closing commit, we now print a more helpful error message:
>>> from exchangelib.version import EXCHANGE_2010
>>> from exchangelib.items import Item
>>> from exchangelib.version import Version
>>> Item.validate_field(field='text_body', version=Version(build=EXCHANGE_2010))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nfs/home/ekc/repos/office365-notifier/exchangelib/exchangelib/properties.py", line 215, in validate_field
% (field.name, version, field.supported_from, field.deprecated_from))
exchangelib.properties.InvalidFieldForVersion: Field 'text_body' is not supported on server version Build=14.0.0.0, API=Exchange2010, Fullname=Microsoft Exchange Server 2010 (supported from: 15.0.0.0, deprecated from: None)
I receive following error. It has been working for a while and now it suddenly has this error again. I am sifting through about 1000 emails every time. So it works "mostly" . I also have increased the TIMEOUT value. I run the same emails again and again in the same folder with :
for item in folder.all():
Python 2.7 64-bit Windows Server 2016 Standard
`