Closed KyungmoIBM closed 12 months ago
I also test the code with Python 3.9.14 with ibm-cos-sdk-for-python latest version(2.12.0) and saw same result.
ibm-cos-sdk 2.12.0
ibm-cos-sdk-core 2.12.0
ibm-cos-sdk-s3transfer 2.12.0```
Output
Thread: <_MainThread(MainThread, started 140704422986944)>
File "/Users/kyungmo/workspace/ibm-cos/cos-python/CS3089342-memory-leak.py", line 29, in <module>
stack = "".join(traceback.format_stack(
Thread: <Thread(Thread-1, started daemon 123145534840832)>
File "/Users/kyungmo/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 937, in _bootstrap
self._bootstrap_inner()
File "/Users/kyungmo/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/Users/kyungmo/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/Users/kyungmo/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ibm_botocore/credentials.py", line 2472, in _background_refresher
self._background_thread_wakeup_event.wait(new_remaining)
File "/Users/kyungmo/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 581, in wait
signaled = self._cond.wait(timeout)
File "/Users/kyungmo/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 316, in wait
gotit = waiter.acquire(True, timeout)
Iteration: 1
Thread: <_MainThread(MainThread, started 140704422986944)>
File "/Users/<user>/workspace/ibm-cos/cos-python/CS3089342-memory-leak.py", line 29, in <module>
stack = "".join(traceback.format_stack(
Thread: <Thread(Thread-1, started daemon 123145534840832)>
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 937, in _bootstrap
self._bootstrap_inner()
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ibm_botocore/credentials.py", line 2472, in _background_refresher
self._background_thread_wakeup_event.wait(new_remaining)
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 581, in wait
signaled = self._cond.wait(timeout)
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 316, in wait
gotit = waiter.acquire(True, timeout)
Thread: <Thread(Thread-2, started daemon 123145551630336)>
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 937, in _bootstrap
self._bootstrap_inner()
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/site-packages/ibm_botocore/credentials.py", line 2472, in _background_refresher
self._background_thread_wakeup_event.wait(new_remaining)
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 581, in wait
signaled = self._cond.wait(timeout)
File "/Users/<user>/.pyenv/versions/3.9.14/lib/python3.9/threading.py", line 316, in wait
gotit = waiter.acquire(True, timeout)```
@KyungmoIBM
Can we have a bit more details on what exactly the thread is doing in this case. Breaking down the sample code I can see that after the credentials are been read and client is created, the tread begins to execute.
It is this statement traceback.format_stack(sys._current_frames()[th.ident])
thats throws the error which we see in every iteration. Could you help us here in order to check what exactly is been attempted in here.
Breaking down the code I see the following output after running for two iterations
for th in threading.enumerate():
print(th)
print(th.ident)
print(sys._current_frames)
OUTPUT
Iteration : 0
<_MainThread(MainThread, started 4440643072)>
4440643072
<built-in function _current_frames>
<Thread(Thread-1, started daemon 123145463848960)>
123145463848960
<built-in function _current_frames>
Iteration : 1
<_MainThread(MainThread, started 4440643072)>
4440643072
<built-in function _current_frames>
<Thread(Thread-1, started daemon 123145463848960)>
123145463848960
<built-in function _current_frames>
<Thread(Thread-2, started daemon 123145480638464)>
123145480638464
<built-in function _current_frames>
There are few more findings from our end. The above pointed file (credentials.py) from ibm_botocore is used when generating token. If a generated token expired and new token is generated without waiting the token to expire then background token-refresh-thread will not finish processing it.
For Example :
def wait_till_token_refreshed(token_manager, wait_timeout=60):
token_manager.wakeup_refresh_thread()
for _ in range(ceil(wait_timeout / 2.0)):
if token_manager._background_thread_wakeup_event.isSet():
# Sleep and give a chance to background_thread to fetch the token.
# Once the token is fetched, the background_thread clears the event.
# Don't sleep for too long (>5 secs), because _advisory_refresh_timeout is still close
# to the new token expiry time, the bg thread might wakeup and fetch a new token again.
time.sleep(2)
else:
break
else:
raise Exception("Background token-refresh-thread did not finish processing in %d secs." % wait_timeout)
This is example of a method that can be used after a token is expired.
However in this regard there are few question that we need some clarity on.
Can we have a bit more clarity on the what the ask is here if possible a better example to illustrate the memory leak scenario.
Thanks.
@KyungmoIBM Can we have some update on this. This is open for some time and we had some conversation on the same. Will it be possible to have some clarification on this so that we can move forward with this.
Thanks
We are closing this issue as resolved as there is no further communication. Please create a new issue if you are facing the same problem again.
A customer reported a long running python application that uses ibm_boto3_core that the application is leaking threads / memory. Cusomer traced the issue to ibm_boto3_core. The below code illustrates the leak. Each iteration of the for loop leaks a thread at:
When s3_client goes out of scope, the DefaultTokenManager's background_refresher thread stays alive.
I also notice that ibm_boto3_core registers an an exit handler to stop the thread (atexit.register(self._cleanup)). But, this doesn't help for long running applications.
Example Output