Closed bencleary closed 5 years ago
That depends on where the MemoryError
comes from. You'll have to post the stack trace to track this down.
Thanks for getting back in touch, here is the code, i am just running it to get the stacktrace. I am using a while loop as when i just try to use .all() or .iterator() i get a connection forcibly closed error, slicing it into smaller chunks seem to help (i know the slicing here is rudimentary, this is not a live account just for testing so missing items are fine), like i said calendars and contacts have worked fine its just emails where i run into problems.
class MigrateInbox(MigrationConfig):
def inbox_count(self):
return self.current_account.inbox.total_count
def migrated_inbox_count(self):
self.target_account.inbox.refresh()
return self.target_account.inbox.total_count
def migrate_inbox(self):
count = self.inbox_count()
print(f"Current Email (Inbox) Count - {count}")
folder = self.current_account.inbox
pagesize = 5
index = 0
current = 0
while index < count:
current += pagesize
items = folder.all().only('mime_content')[index:current]
data = self.current_account.export(items)
self.bulk_migrate(folder=self.target_account.inbox, upload=data) # just a wrapper for upload
index += pagesize
print(f"index -> {index} ---- current -> {current}")
Here is the stack trace from when i ran it this morning:
C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\Scripts\python.exe C:/Users/benja/Development/office_365_tools/ews-test.py
Current Email (Inbox) Count - 265
index -> 5 ---- current -> 5
index -> 10 ---- current -> 10
index -> 15 ---- current -> 15
index -> 20 ---- current -> 20
index -> 25 ---- current -> 25
index -> 30 ---- current -> 30
index -> 35 ---- current -> 35
index -> 40 ---- current -> 40
index -> 45 ---- current -> 45
index -> 50 ---- current -> 50
index -> 55 ---- current -> 55
index -> 60 ---- current -> 60
index -> 65 ---- current -> 65
index -> 70 ---- current -> 70
index -> 75 ---- current -> 75
index -> 80 ---- current -> 80
index -> 85 ---- current -> 85
index -> 90 ---- current -> 90
index -> 95 ---- current -> 95
index -> 100 ---- current -> 100
index -> 105 ---- current -> 105
index -> 110 ---- current -> 110
index -> 115 ---- current -> 115
index -> 120 ---- current -> 120
index -> 125 ---- current -> 125
index -> 130 ---- current -> 130
index -> 135 ---- current -> 135
index -> 140 ---- current -> 140
index -> 145 ---- current -> 145
index -> 150 ---- current -> 150
index -> 155 ---- current -> 155
index -> 160 ---- current -> 160
index -> 165 ---- current -> 165
index -> 170 ---- current -> 170
index -> 175 ---- current -> 175
index -> 180 ---- current -> 180
index -> 185 ---- current -> 185
index -> 190 ---- current -> 190
index -> 195 ---- current -> 195
index -> 200 ---- current -> 200
index -> 205 ---- current -> 205
index -> 210 ---- current -> 210
index -> 215 ---- current -> 215
index -> 220 ---- current -> 220
index -> 225 ---- current -> 225
index -> 230 ---- current -> 230
index -> 235 ---- current -> 235
EWS https://outlook.office365.com/EWS/Exchange.asmx, account XXXX HIDDEN XXXX: Exception in _get_elements: Traceback (most recent call last):
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 89, in _get_elements
response = self._get_response_xml(payload=payload)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 171, in _get_response_xml
res = self._get_soap_payload(response=r, **parse_opts)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 260, in _get_soap_payload
root = to_xml(response.iter_content())
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\util.py", line 365, in to_xml
return parse(stream, parser=forgiving_parser)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\defusedxml\lxml.py", line 134, in parse
elementtree = _etree.parse(source, parser, base_url=base_url)
File "src\lxml\etree.pyx", line 3424, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1857, in lxml.etree._parseDocument
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\util.py", line 340, in getvalue
res = b''.join(self._bytes_generator)
MemoryError
Traceback (most recent call last):
File "C:/Users/benja/Development/office_365_tools/ews-test.py", line 10, in <module>
EmailMigration(old_account=current, new_account=target).migrate_inbox()
File "C:\Users\benja\Development\office_365_tools\office_365_migration\email_migration.py", line 89, in migrate_inbox
data = self.current_account.export(items)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\account.py", line 320, in export
self._consume_item_service(service_cls=ExportItems, items=items, chunk_size=chunk_size, kwargs=dict())
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\account.py", line 302, in _consume_item_service
is_empty, items = peek(items)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\util.py", line 118, in peek
first = next(iterable)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\queryset.py", line 298, in __iter__
for val in self._format_items(items=self._query(), return_format=self.return_format):
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\queryset.py", line 375, in _item_yielder
for i in iterable:
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\account.py", line 580, in fetch
shape=ID_ONLY,
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\account.py", line 308, in _consume_item_service
for i in service_cls(account=self, chunk_size=chunk_size).call(**kwargs):
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 676, in _pool_requests
elems = r.get()
File "c:\python37\Lib\multiprocessing\pool.py", line 657, in get
raise self._value
File "c:\python37\Lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 656, in <lambda>
lambda c: self._get_elements(payload=payload_func(c, **kwargs)),
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 89, in _get_elements
response = self._get_response_xml(payload=payload)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 171, in _get_response_xml
res = self._get_soap_payload(response=r, **parse_opts)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\services.py", line 260, in _get_soap_payload
root = to_xml(response.iter_content())
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\util.py", line 365, in to_xml
return parse(stream, parser=forgiving_parser)
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\defusedxml\lxml.py", line 134, in parse
elementtree = _etree.parse(source, parser, base_url=base_url)
File "src\lxml\etree.pyx", line 3424, in lxml.etree.parse
File "src\lxml\parser.pxi", line 1857, in lxml.etree._parseDocument
File "C:\Users\benja\.virtualenvs\office_365_tools-UdUAg9MW\lib\site-packages\exchangelib\util.py", line 340, in getvalue
res = b''.join(self._bytes_generator)
MemoryError
Swapping my Python version to 64 bit solved the memoryerror, doing some memory profiling i can see at around the 240 mark, memory usage spikes to 3.5gb which i guess is near the limit for 32bit python...using 64bit solved that, just wondering if there are any other options you could advise on large amounts of emails, for instance say there is a mailbox of about 10gb, are there any other methods in this library that could help speed up the querying, changing cache from in memory to disk (i know speed would be hit but space would be better), i know the export method is heavy as its encoded strings but just wondering if there is anything else you can suggest?
The export()
method doesn't need a full item, just the item ID of the messages to export. So instead of items = folder.all().only('mime_content')[index:current]
you could do just items = folder.all().only('item_id', 'changekey')[index:current]
. That would reduce the memory pressure somewhat.
It would be great if you could run a memory profiler over your code to pinpoint what is consuming all the memory. The stack trace is not very helpful because it crashes at the point you run out of memory, which is not necessarily where the bulk of the memory is being consumed.
Is it possible that some of your items contain huge attachments? You could try exporting just one item at a time and then dump to disk:
i = 1
for item in account.inbox.all().only('item_id', 'changekey'):
data = account.export([item])[0]
with open('item%s.dat' % i, 'w') as f:
f.write(data)
i += 1
Well since your last answer, i haven't experienced the memory error, changing to item_id and changekey have really reduced the memory usage and it rarely goes over 400mb now. I think you can chalk this up to user error.
On a plus side, i have successfully migrated 2 existing Office 365 tenants into one using it, total mailbox data around 25gb per tenant, so that has saved me a lot of time and hassle!! Thanks 👍
I will mark this as closed now as its all working fine.
Glad to get successful reports for this code path with a significant data volume!
Hi,
Firstly thank you for this library! I have been using this to aid me merging Office 365 tenancies, contacts work great, calendar as well but when I start using export and upload for emails it throws a MemoryError at around 250 emails in, the account I have been working on only has 270 items in the inbox, so I am wondering if you know anything I can do to stop that from happening?
I will post my code tomorrow as I am on my phone at the moment, but any advice would be appreciated
Thanks 👍