ecederstrand / exchangelib

Python client for Microsoft Exchange Web Services (EWS)
BSD 2-Clause "Simplified" License
1.18k stars 248 forks source link

Memory leak in long-running scripts when using OAuth #1100

Open sunilpurwar opened 2 years ago

sunilpurwar commented 2 years ago

Describe the bug connecting to the EWS with python using Exchanhelib 4.7.4 to implement OAuth on Office 365 memory usage going up with every request.

To Reproduce

from exchangelib import OAuth2Credentials,Build,Version,Configuration, OAUTH2,Account,IMPERSONATION
## Removing below info client _id client_secret, tenant_id because it is confidential 
CLIENT_ID = '*****************'
CLIENT_SECRET = '***********'
TENANT_ID = '****'
version = Version(build=Build(15, 0, 12, 34))

credentials = OAuth2Credentials(
  client_id=CLIENT_ID, client_secret=CLIENT_SECRET, tenant_id=TENANT_ID 
)

config = Configuration(service_endpoint = 'https://outlook.office365.com/EWS/Exchange.asmx',
                       credentials=credentials,
                       auth_type=OAUTH2 ,#NTLM, ##'OAuth 2.0', NTLM,OAUTH2
                       version=version) 

account = Account(
    primary_smtp_address='dummy@sdummy.com',  ## need to put valid adress 
    credentials=credentials, 
    autodiscover=False,
    config=config,
    access_type=IMPERSONATION
)

mails = list(account.inbox.filter(is_read=False).only(
        'is_read', 'subject', 'body','text_body','datetime_received',
        'sender','to_recipients','cc_recipients','bcc_recipients',
        'attachments','importance'
    ).order_by('datetime_received')[:1])

Additional context Python - 3.8.5 Exchangelib - 4.7.4 memory

Note : hitting EWS server every minute to get the mails , earlier we are using Exchangelib Basic authentication which is working fine but now deprecating , so we just try to move on OAuth on Office 365 using exchnagelib 4.7.4 causing this issue memory usage going up with every request on our every env Dev/QA and UAT,

ecederstrand commented 2 years ago

Are you saying that this only happens with OAuth and not Basic authentication? For OAuth we use the requests_oauthlib.OAuth2Session session class instead of the standard requests.Session class.

Can you install a memory debugging tool for Python to see exactly which data structures are consuming all this memory?

sunilpurwar commented 2 years ago

Yes. Earlier we are using below given code and python lib. also please suggest a good memory debugging tool for Python .

Python == 3.8.5 Exchangelib ==3.1.1

from exchangelib import DELEGATE,Account,Credentials,Configuration
credentials = Credentials(
    username= 'dummyy@dummy.com',
    password='******'
)
config = Configuration(server='outlook.office365.com', credentials=credentials)
account = Account(
    primary_smtp_address='dummy@gdummyy.com', 
    credentials=credentials, 
    autodiscover=False,
    config=config,
    access_type=DELEGATE
)

mails = list(account.inbox.filter(is_read=False).order_by('datetime_received')[:1])
ecederstrand commented 2 years ago

You can use the built-in tracemalloc package (short tutorial here) or try a more visual approach.

sunilpurwar commented 2 years ago

Thanks Erik, we are looking into memory debugging tool, meanwhile After 100 % memory utilization our server crashed and getting below error:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\cached_property.py", line 70, in __get__
    return obj_dict[name]
KeyError: 'inbox'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\cached_property.py", line 70, in __get__
    return obj_dict[name]
KeyError: 'root'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\protocol.py", line 221, in get_session
    session = self._session_pool.get(block=False)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\queue.py", line 167, in get
    raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
    raise err
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    sock.connect(sa)
OSError: [WinError 10055] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\connection.py", line 358, in connect
    self.sock = conn = self._new_conn()
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000001DA1408CEB0>: Failed to establish a new connection: [WinError 10055] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\requests\adapters.py", line 489, in send
    resp = conn.urlopen(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\urllib3\util\retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): 
Max retries exceeded with url: /4a809fb2-0c7f-4201-95da-06953b7d506f/oauth2/v2.0/token 
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001DA1408CEB0>: Failed to establish a new connection: [WinError 10055] 
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\svnworkspace\scheduler\sompoJobs\services\emailProcess.py", line 45, in fetchEmailsFromExchange
    mails = list(self.account.inbox.filter(is_read=False).only('is_read', 'subject', 'body','text_body',
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\cached_property.py", line 74, in __get__
    return obj_dict.setdefault(name, self.func(obj))
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\account.py", line 285, in inbox
    return self.root.get_default_folder(Inbox)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\cached_property.py", line 74, in __get__
    return obj_dict.setdefault(name, self.func(obj))
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\account.py", line 349, in root
    return Root.get_distinguished(account=self)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\folders\roots.py", line 115, in get_distinguished
    return cls.resolve(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\folders\base.py", line 513, in resolve
    folders = list(FolderCollection(account=account, folders=[folder]).resolve())
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\folders\collections.py", line 335, in resolve
    yield from self.__class__(account=self.account, folders=resolveable_folders).get_folders(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\folders\collections.py", line 403, in get_folders
    yield from GetFolder(account=self.account).call(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\services\get_folder.py", line 43, in _elems_to_objs
    for folder, elem in zip(self.folders, elems):
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\services\common.py", line 245, in _chunked_get_elements
    yield from self._get_elements(payload=payload_func(chunk, **kwargs))
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\services\common.py", line 265, in _get_elements
    yield from self._response_generator(payload=payload)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\services\common.py", line 227, in _response_generator
    response = self._get_response_xml(payload=payload)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\services\common.py", line 343, in _get_response_xml
    r = self._get_response(payload=payload, api_version=api_version)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\services\common.py", line 296, in _get_response
    session = self.protocol.get_session()
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\protocol.py", line 225, in get_session
    self.increase_poolsize()
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\protocol.py", line 191, in increase_poolsize
    self._session_pool.put(self.create_session(), block=False)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\protocol.py", line 288, in create_session
    session = self.create_oauth2_session()
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\exchangelib\protocol.py", line 341, in create_oauth2_session
    token = session.fetch_token(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\requests_oauthlib\oauth2_session.py", line 336, in fetch_token
    r = self.request(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\requests_oauthlib\oauth2_session.py", line 515, in request
    return super(OAuth2Session, self).request(
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\requests\sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\requests\sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\uatenv\lib\site-packages\requests\adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='login.microsoftonline.com', port=443): 
Max retries exceeded with url: /4a809fb2-0c7f-4201-95da-06953b7d506f/oauth2/v2.0/token 
(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000001DA1408CEB0>: Failed to establish a new connection: 
[WinError 10055] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full'))
sunilpurwar commented 2 years ago

Hi Erik , I have implemented tracemalloc memory debugging tool in our application and take the snapshot dump (attaching the latest and old snapshot dump) and also I have compare both snapshot (attaching the result). Please help me out to resolve this issue compare_snap.log latest_snapshot.log old_snap.log display_top_filename.log display_top_lineno.log display_top_traceback.log .

ecederstrand commented 2 years ago

In the display_top_lineno.log file, there's mention of django, mssql and a lot of other packages that exchangelib does not depend on. It seems like you're running much more code than just the above snippet? What makes you conclude that exchangelib is at fault here?

Also, you mention memory increasing for every request, using some sort of polling every minute? The above snippet does not do that. It just fetches email data and exits.

sunilpurwar commented 2 years ago

Yes Erik, but similar code is running fine when we are using basic authentication: Python == 3.8.5 Exchangelib ==3.1.1

from exchangelib import DELEGATE,Account,Credentials,Configuration
credentials = Credentials(
    username= 'dummyy@dummy.com',
    password='******'
)
config = Configuration(server='outlook.office365.com', credentials=credentials)
account = Account(
    primary_smtp_address='dummy@gdummyy.com', 
    credentials=credentials, 
    autodiscover=False,
    config=config,
    access_type=DELEGATE
)

mails = list(account.inbox.filter(is_read=False).order_by('datetime_received')[:1])

but After doing only below changes to implement OAuth on office 365 in our code memory usage going up with every request.

Python - 3.8.5 Exchangelib - 4.7.4

from exchangelib import OAuth2Credentials,Build,Version,Configuration, OAUTH2,Account,IMPERSONATION
## Removing below info client _id client_secret, tenant_id because it is confidential 
CLIENT_ID = '*****************'
CLIENT_SECRET = '***********'
TENANT_ID = '****'
version = Version(build=Build(15, 0, 12, 34))

credentials = OAuth2Credentials(
  client_id=CLIENT_ID, client_secret=CLIENT_SECRET, tenant_id=TENANT_ID 
)

config = Configuration(service_endpoint = 'https://outlook.office365.com/EWS/Exchange.asmx',
                       credentials=credentials,
                       auth_type=OAUTH2 ,
                       version=version) 

account = Account(
    primary_smtp_address='dummy@sdummy.com',  ## need to put valid adress 
    credentials=credentials, 
    autodiscover=False,
    config=config,
    access_type=IMPERSONATION
)

mails = list(account.inbox.filter(is_read=False).only(
        'is_read', 'subject', 'body','text_body','datetime_received',
        'sender','to_recipients','cc_recipients','bcc_recipients',
        'attachments','importance'
    ).order_by('datetime_received')[:1])
ecederstrand commented 2 years ago

Ok, so you're changing both the authentication mechanism and the exchangelib version. Please try the same, but with the same exchangelib version both places.

It's not unexpected that memory will increase while the last line is running. exchangelib fetches items in chunks when there are many items. In the above, you're fetching full items, including attachments, so memory usage may be considerable, depending on the number of items and the attachment sizes.

sunilpurwar commented 2 years ago

Hi Erik , we have updated the exchange lib api from 3.1.1 to 4.7.4 with old code (given below) on our server 2 days back and its working fine , below is the code which is on my server on which we only updated exchange lib api from 3.1.1 to 4.7.4 . code is running fine when we are using basic authentication : Python == 3.8.5 Exchangelib ==4.7.4

from exchangelib import DELEGATE,Account,Credentials,Configuration
credentials = Credentials(
    username= 'dummyy@dummy.com',
    password='******'
)
config = Configuration(server='outlook.office365.com', credentials=credentials)
account = Account(
    primary_smtp_address='dummy@gdummyy.com', 
    credentials=credentials, 
    autodiscover=False,
    config=config,
    access_type=DELEGATE
)

mails = list(account.inbox.filter(is_read=False).only( 'is_read', 'subject', 'body','text_body','datetime_received',      'sender','to_recipients','cc_recipients','bcc_recipients','attachments','importance').order_by('datetime_received')[:1])

*** So my issue is as you aware that Office 365 is deprecating Basic authentication, so i need to implement OAuth on office 365 and when we implement OAuth on office 365 in our code we are facing memory issue (memory usage going up with every request), only because of these below changes which we coded for OAuth on office 365. I am also attaching some more analysis and snap shot dump from trace malloc. traceback.log snapshot09.zip

Code which we changed to implement for OAuth on office 365

from exchangelib import OAuth2Credentials,Build,Version,Configuration, OAUTH2,Account,IMPERSONATION
## Removing below info client _id client_secret, tenant_id because it is confidential 
CLIENT_ID = '*****************'
CLIENT_SECRET = '***********'
TENANT_ID = '****'
version = Version(build=Build(15, 0, 12, 34))

credentials = OAuth2Credentials(
  client_id=CLIENT_ID, client_secret=CLIENT_SECRET, tenant_id=TENANT_ID 
)

config = Configuration(service_endpoint = 'https://outlook.office365.com/EWS/Exchange.asmx',
                       credentials=credentials,
                       auth_type=OAUTH2 ,
                       version=version) 

account = Account(
    primary_smtp_address='dummy@sdummy.com',  ## need to put valid adress 
    credentials=credentials, 
    autodiscover=False,
    config=config,
    access_type=IMPERSONATION
)

mails = list(account.inbox.filter(is_read=False).only(
        'is_read', 'subject', 'body','text_body','datetime_received',
        'sender','to_recipients','cc_recipients','bcc_recipients',
        'attachments','importance'
    ).order_by('datetime_received')[:1])
ecederstrand commented 2 years ago

This could be due to a memory leak in the requests or requests_oauthlib packages. Can you try adding the following to the top of your script?

from exchangelib.protocol import Protocol
Protocol.MAX_SESSION_USAGE_COUNT = 10  # Or some other low number
sunilpurwar commented 2 years ago

implemented give above code as you suggested but no improvement found still facing same issue..

ecederstrand commented 2 years ago

Possible fix posted in https://github.com/ecederstrand/exchangelib/issues/1090#issuecomment-1207214738

sunilpurwar commented 2 years ago

still facing same issue after implementing #1090 , image

andreiaciobanitei commented 2 years ago

Facing the same RAM memory increasing issue, on a long-running script that connects every 5 seconds to an outlook mailbox using OAuth2Credentials, Configuration and Account from exchangelib:

oauth2_credentials = json.loads(item['oauth2_credentials'])
credentials = OAuth2Credentials(
    client_id=oauth2_credentials['client_id'],
    client_secret=oauth2_credentials['client_secret'],
    tenant_id=oauth2_credentials['tenant_id'],
    identity=Identity(primary_smtp_address=item['login_email'])
)
config = Configuration(
    server=item['inbound_email_server'],
    credentials=credentials,
    auth_type=OAUTH2,
)
account = Account(
    primary_smtp_address=item['login_email'],
    config=config,
    autodiscover=False,
    access_type=DELEGATE,
)
achillis2 commented 1 year ago

Is this issue fixed? The deadline for basic authentication is 12/31/2022. We also need the fix to use OAuth.

ecederstrand commented 1 year ago

Nope, to my knowledge it's not fixed. I also cannot reproduce this in my local setup. You'll need to start debugging in your own processes where the memory usage is piling up.

As a workaround, you can have the Python process exit and restart every once in a while.

EazyEgan commented 1 year ago

Also having this exact issue with the following code:

def _connect(self):
    if self.exchange_oauth_tenant_id and self.exchange_oauth_client_id and self.exchange_oauth_client_secret:
        credentials = OAuth2Credentials(
            client_id=self.exchange_oauth_client_id,
            client_secret=self.exchange_oauth_client_secret,
            tenant_id=self.exchange_oauth_tenant_id,
            identity=Identity(primary_smtp_address=self.email_address),
        )
    elif self.exchange_user and self.exchange_password:
        credentials = Credentials(username=self.exchange_user, password=self.exchange_password)
    else:
        raise RuntimeError("No Microsoft Exchange details available")

    configuration = Configuration(server=self.exchange_server, credentials=credentials)
    self.exchange_account = Account(
        primary_smtp_address=self.email_address, autodiscover=False, access_type=DELEGATE, config=configuration
    )
    return self.exchange_account

No memory issues when username and password verification is used, but linear increase in memory usage when OAuth is used. The system routinely checks mailboxes for new emails and operates on several accounts.

Issue resolution:

I think we have been able to solve this issue at least in our instance, similar to this solution:

This could be due to a memory leak in the requests or requests_oauthlib packages. Can you try adding the following to the top of your script?

from exchangelib.protocol import Protocol
Protocol.MAX_SESSION_USAGE_COUNT = 10  # Or some other low number

except we close out the connection after every mailbox scan:

    @contextmanager
    def connection(self) -> Iterator["ExchangeEmailSystem"]:
        try:
            # Open mailbox connection
            self._connect()
            yield self
        finally:
            # Close mailbox connection
            self.exchange_account.protocol.close()

In our code responsible for scanning the mailboxes, the connection function is called using the with statement and the mailbox is checked for new emails. When this check is complete, the finally block is executed and the account is closed.

    with account.connection():
          folder_object = [x for x in account.root.walk() if x.name == "Inbox"]
          # etc.

This happens routinely for all mailboxes. Maybe some increase in CPU usage but nothing that we've noticed. Below is a graph of our RAM usage since implementing OAuth for Exchange. Screenshot 2022-12-21 at 11 25 42