aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
14.94k stars 1.99k forks source link

Memory leak when doing get() and scope closes #4833

Open gregdan3 opened 4 years ago

gregdan3 commented 4 years ago

🐞 Describe the bug

When downloading content from a website, memory usage increases roughly proportionally to the size of the content downloaded, with some overhead for the library and other objects in memory.

When the scope where that content is downloaded is closed and inaccessible, I expect the garbage collector to de-allocate the objects which were downloaded, but memory usage appears to remain near the same level as peak memory usage right after downloads finish.

💡 To Reproduce

  1. Set up a default ClientSession
  2. Open a connection with any site
  3. Download content from that site
  4. Leave the scope in which that content was downloaded
  5. Examine memory usage during and after download, observing that memory usage has not significantly decreased

💡 Expected behavior Once the downloaded objects are no longer in scope, they should be de-allocated by the garbage collector and memory usage should fall back to a similar level as when the script initialized.

📋 Logs/tracebacks

I have no specific logs/tracebacks (no exception occurred), but I have profiled the relevant functions in two scripts with memory_profiler to demonstrate memory usage.

The first script is created with aiohttp, demonstrating the bug:

aiohttp_profile.txt

Filename: ./aiohttp_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     26.8 MiB     26.8 MiB           1   @profile
    23                                         async def quicktest():
    24     26.8 MiB      0.0 MiB           1       url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    25     26.8 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     26.8 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    260.7 MiB    233.9 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    260.7 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    260.7 MiB      0.0 MiB           1       await session.close()

Filename: ./aiohttp_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     26.7 MiB     26.7 MiB           1   @profile
    34                                         def main():
    35     26.7 MiB      0.0 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     26.8 MiB      0.1 MiB           1       loop = asyncio.get_event_loop()
    37     26.8 MiB      0.0 MiB           1       try:
    38    201.4 MiB    174.6 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40    201.4 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()

The bug is observed here: On line 38, I expect memory usage to reach its peak, and the next line should see memory usage drop to init levels. However, on line 40, memory usage is still at the same level, even though the scope where the memory was allocated is gone and inaccessible.

The second is created using requests in a nearly identical way, but demonstrates the behavior I expect in aiohttp:

requests_profile.txt

Filename: ./requests_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    19     23.9 MiB     23.9 MiB           1   @profile
    20                                         def quicktest():
    21     23.9 MiB      0.0 MiB           1       url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    22     23.9 MiB      0.0 MiB           1       session = requests.Session()
    23    126.2 MiB    102.2 MiB           8       jobs_result = [grab_data(url, session) for _ in range(5)]
    24    126.2 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    25                                                 _log.info("Something didn't give back data")
    26    126.2 MiB      0.0 MiB           1       session.close()

Filename: ./requests_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    29     23.9 MiB     23.9 MiB           1   @profile
    30                                         def main():
    31     23.9 MiB      0.0 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    32     23.9 MiB      0.0 MiB           1       try:
    33     27.4 MiB      3.5 MiB           1           quicktest()
    34                                                 # memory usage should be back to normal
    35     27.4 MiB      0.0 MiB           1           _log.info("Completed example")
    36                                             except KeyboardInterrupt:
    37                                                 _log.info("shutting down...")

Here the memory usage shown by memory_profiler is considerably different, because the memory usage drops as the function quicktest() on line 33 ends, so the scope of main() never sees the same level of memory usage.

📋 Your version of the Python

python --version
Python 3.8.3

📋 Your version of the aiohttp/yarl/multidict distributions

python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: attrs, chardet, multidict, async-timeout, yarl
Required-by:
$ python -m pip show multidict
(memory_leak_aiohttp) gregdan3@Theros ~/g/c/memory_leak_aiohttp (master)>
python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: multidict, idna
Required-by: aiohttp

📋 Additional context

The relevant scripts and my Pipfile* are here: scripts.zip

#!/usr/bin/env python
import asyncio
import logging
from concurrent.futures import ThreadPoolExecutor

import aiohttp
from memory_profiler import profile

_log = logging.getLogger(__name__)

LOG_FORMAT = (
    "[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s]    %(message)s"
)

async def grab_data(url, session):
    async with session.get(url) as response:
        data = await response.read()
    return data

@profile
async def quicktest():
    url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    session = aiohttp.ClientSession()
    jobs = [grab_data(url, session) for _ in range(5)]
    jobs_result = await asyncio.gather(*jobs)
    if not all(jobs_result):  # sanity check
        _log.info("Something didn't give back data")
    await session.close()

@profile
def main():
    logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(quicktest())
        # memory usage should be back to normal
        _log.info("Completed example")
    except KeyboardInterrupt:
        _log.info("shutting down...")
        loop.stop()

if __name__ == "__main__":
    main()
#!/usr/bin/env python
import logging
import requests
from memory_profiler import profile

_log = logging.getLogger(__name__)

LOG_FORMAT = (
    "[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s]    %(message)s"
)

def grab_data(url, session):
    with session.get(url) as response:
        data = response.content
    return data

@profile
def quicktest():
    url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    session = requests.Session()
    jobs_result = [grab_data(url, session) for _ in range(5)]
    if not all(jobs_result):  # sanity check
        _log.info("Something didn't give back data")
    session.close()

@profile
def main():
    logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    try:
        quicktest()
        # memory usage should be back to normal
        _log.info("Completed example")
    except KeyboardInterrupt:
        _log.info("shutting down...")

if __name__ == "__main__":
    main()

And my Pipfile:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
memory_profiler = {git = "ssh://git@github.com/pythonprofilers/memory_profiler.git"}
aiohttp = "*"
requests = "*"

I have not embedded my Pipfile.lock due to its length, but it is in the scripts.zip above and demonstrates the same versions for my dependencies as I showed before.

All of my dependencies are listed above.

I created and tested the scripts above on my personal machine running Archlinux, but originally observed the bug that led me to create these scripts on a machine running CentOS.

uname --all
Linux HOSTNAME 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux
Linux HOSTNAME 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
ElijahSink commented 4 years ago

I'm having this exact same issue.

jmfrees commented 4 years ago

Any update on this? Would love to see this issue addressed!

wei-hai commented 4 years ago

experiencing the same issue

Mahesh0253 commented 4 years ago

On Python 3.7.8 it's working fine, but with Python 3.8+ versions getting memory leak problem.

I hope someone solve this problem

derlih commented 3 years ago

Just did a check on my machine on 2a9d5e9e (master):

Filename: ./tests/4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     32.4 MiB     32.4 MiB           1   @profile
    23                                         async def quicktest():
    24     32.4 MiB      0.0 MiB           1       url = "http://localhost:8000/A17_FlightPlan.pdf"
    25     32.4 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     32.4 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    131.9 MiB     99.5 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    131.9 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    131.9 MiB      0.0 MiB           1       await session.close()

[2020-11-11 18:24:58,934] [          4833_test.py:40  ] [    INFO]    Completed example
Filename: ./tests/4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     32.2 MiB     32.2 MiB           1   @profile
    34                                         def main():
    35     32.4 MiB      0.2 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     32.4 MiB      0.0 MiB           1       loop = asyncio.get_event_loop()
    37     32.4 MiB      0.0 MiB           1       try:
    38     33.2 MiB      0.8 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40     33.2 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()
$ python --version
Python 3.8.5
$ uname -a
Linux 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:    20.04
Codename:   focal
derlih commented 3 years ago

Same PC on aiohttp 3.7.2

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     26.7 MiB     26.7 MiB           1   @profile
    23                                         async def quicktest():
    24     26.7 MiB      0.0 MiB           1       url = "http://localhost:8000/A17_FlightPlan.pdf"
    25     26.7 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     26.7 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    126.2 MiB     99.5 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    126.2 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    126.2 MiB      0.0 MiB           2       await session.close()

[2020-11-11 18:32:41,084] [          4833_test.py:40  ] [    INFO]    Completed example
Filename: /home/dmitry/Sources/aiohttp/tests/4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     26.7 MiB     26.7 MiB           1   @profile
    34                                         def main():
    35     26.7 MiB      0.0 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     26.7 MiB      0.0 MiB           1       loop = asyncio.get_event_loop()
    37     26.7 MiB      0.0 MiB           1       try:
    38     27.5 MiB      0.8 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40     27.5 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()
$ python --version
Python 3.8.5
$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: multidict, yarl, attrs, typing-extensions, chardet, async-timeout
Required-by: 
$ python -m pip show multidict
Name: multidict
Version: 5.0.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: 
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.6.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp
asvetlov commented 3 years ago

aiohttp doesn't use weak references to break cycles between moving parts. In practice, it means that the memory is released not immediately but at the next call of the garbage collector. You can enforce the gc by pushing gc.collect() just after loop.run_until_complete(quicktest()).

Does it help?

derlih commented 3 years ago

@asvetlov in my environment it is not needed. In the main function there is no memory leak. May be it was a problem in older Python 3.8 or may be Python was patched somehow in a CentOS.

I will check later this leak in CentOS docker.

asvetlov commented 3 years ago

OK. BTW, for checking on different Python versions I use https://github.com/pyenv/pyenv

pyenv install 3.9.0
pyenv virtualenv 3.9.0 aiohttp-3.9.0
pyenv local aiohttp-3.9.0

Works like a charm.

P.S. Need to initialize PATH and PYENV_ROOT in bashrc/zshrc first as described in pyenv installation section first.

derlih commented 3 years ago

For CentOS I've got the leak :-(

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     28.3 MiB     28.3 MiB           1   @profile
    23                                         async def quicktest():
    24     28.3 MiB      0.0 MiB           1       url = "http://localhost:8000/A17_FlightPlan.pdf"
    25     28.3 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     28.3 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    128.0 MiB     99.7 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    128.0 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    128.0 MiB      0.0 MiB           2       await session.close()

[2020-11-13 12:52:31,871] [          4833_test.py:40  ] [    INFO]    Completed example
Filename: ./4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     28.1 MiB     28.1 MiB           1   @profile
    34                                         def main():
    35     28.3 MiB      0.1 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     28.3 MiB      0.0 MiB           1       loop = asyncio.get_event_loop()
    37     28.3 MiB      0.0 MiB           1       try:
    38     68.7 MiB     40.4 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40     68.7 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()

Dockerfile:

FROM centos
RUN yum install -y python3
RUN yum groupinstall -y 'Development Tools'
RUN yum install -y python3-devel
RUN pip3 install memory_profiler
RUN yum install -y wget
RUN wget https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf
RUN pip3 install aiohttp==3.7.2
ADD 4833_test.py /
$ python3 --version
Python 3.6.8
asvetlov commented 3 years ago

Oooh. Thanks. At least I have comprehensive instruction for "how to reproduce".

JustinTArthur commented 3 years ago

Having a memory leak in one of my aiohttp apps, I've been following this closely. If memory_profiler is to be trusted, this still happens with aiohttp 3.7.x and cpython 3.9.4.

I've tried the following with no change to the apparent leak reported by memory_profiler:

There are dependencies between a HttpResponseParser, Connection, ClientResponse, ResponseHandler, and Connector objects, some of them are circular. However, as of Python 3.4, the garbage collector should be able to collect them all as long as nothing refers to one of them after running all of their finalizers (like __del__(…)).

gboumis-parrot commented 2 years ago

I experience the same problem with aiohttp 3.8.1 and python 3.7.1

gregdan3 commented 1 year ago

Here's some bells and whistles to make spotting the issue easier:

https://github.com/gregdan3/aiohttp-memleak