Tinche / aiofiles

File support for asyncio
Apache License 2.0
2.67k stars 149 forks source link

memory leak? #40

Open alexlocher opened 6 years ago

alexlocher commented 6 years ago

Hi Tinche

First of all thanks for the great work! Asyncronous file support for asyncio is a great thing to have!

While testing a small project, I noticed a large amount of threads and big memory consumption in the python process. I deciced to write a small testscript which just writes to a file in a loop and tracks the memory:

#!/usr/bin/python3
import asyncio
import aiofiles

import os
import psutil

async def printMemory():
    for iteration in range(0, 20):

        # grab the memory statistics
        p = psutil.Process(os.getpid())
        vms =p.memory_info().vms / (1024.0*1024.0)
        threads = p.num_threads()
        print(f'Iteration {iteration:>2d} - Memory usage (VMS): {vms:>6.1f} Mb; # threads: {threads:>2d}')

        # simple write to a test file
        async with aiofiles.open('test.txt',mode='w') as f:
            await f.write('hello\n')

        # a wait, just for the sake of it
        await asyncio.sleep(1)

loop = asyncio.get_event_loop()

try:
    loop.run_until_complete(printMemory())
finally:
    loop.close()

The output shows some worrisome numbers (run with Python 3.6.5 on Debian 8.10 (jessy) ):

Iteration  0 - Memory usage (VMS):   92.5 Mb; # threads:  1
Iteration  1 - Memory usage (VMS):  308.5 Mb; # threads:  4
Iteration  2 - Memory usage (VMS):  524.6 Mb; # threads:  7
Iteration  3 - Memory usage (VMS):  740.6 Mb; # threads: 10
Iteration  4 - Memory usage (VMS):  956.6 Mb; # threads: 13
Iteration  5 - Memory usage (VMS): 1172.6 Mb; # threads: 16
Iteration  6 - Memory usage (VMS): 1388.7 Mb; # threads: 19
Iteration  7 - Memory usage (VMS): 1604.7 Mb; # threads: 22
Iteration  8 - Memory usage (VMS): 1820.8 Mb; # threads: 25
Iteration  9 - Memory usage (VMS): 2036.8 Mb; # threads: 28
Iteration 10 - Memory usage (VMS): 2252.8 Mb; # threads: 31
Iteration 11 - Memory usage (VMS): 2468.8 Mb; # threads: 34
Iteration 12 - Memory usage (VMS): 2684.8 Mb; # threads: 37
Iteration 13 - Memory usage (VMS): 2900.8 Mb; # threads: 40
Iteration 14 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 15 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 16 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 17 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 18 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 19 - Memory usage (VMS): 2972.8 Mb; # threads: 41

Any idea where this could come from?

Tinche commented 6 years ago

This is what I get on 3.6.5, Ubuntu Bionic:

> python test.py
Iteration  0 - Memory usage (VMS):   65.6 Mb; # threads:  1
Iteration  1 - Memory usage (VMS):  281.6 Mb; # threads:  4
Iteration  2 - Memory usage (VMS):  497.6 Mb; # threads:  7
Iteration  3 - Memory usage (VMS):  713.6 Mb; # threads: 10
Iteration  4 - Memory usage (VMS):  929.6 Mb; # threads: 13
Iteration  5 - Memory usage (VMS): 1145.6 Mb; # threads: 16
Iteration  6 - Memory usage (VMS): 1361.7 Mb; # threads: 19
Iteration  7 - Memory usage (VMS): 1577.7 Mb; # threads: 22
Iteration  8 - Memory usage (VMS): 1793.7 Mb; # threads: 25
Iteration  9 - Memory usage (VMS): 2009.7 Mb; # threads: 28
Iteration 10 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 11 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 12 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 13 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 14 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 15 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 16 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 17 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 18 - Memory usage (VMS): 2225.7 Mb; # threads: 31
Iteration 19 - Memory usage (VMS): 2225.7 Mb; # threads: 31

But changing the script to print out the resident set size instead:

> python test.py
Iteration  0 - Memory usage (RSS):   17.8 Mb; # threads:  1
Iteration  1 - Memory usage (RSS):   18.0 Mb; # threads:  4
Iteration  2 - Memory usage (RSS):   18.0 Mb; # threads:  7
Iteration  3 - Memory usage (RSS):   18.0 Mb; # threads: 10
Iteration  4 - Memory usage (RSS):   18.0 Mb; # threads: 13
Iteration  5 - Memory usage (RSS):   18.0 Mb; # threads: 16
Iteration  6 - Memory usage (RSS):   18.0 Mb; # threads: 19
Iteration  7 - Memory usage (RSS):   18.3 Mb; # threads: 22
Iteration  8 - Memory usage (RSS):   18.3 Mb; # threads: 25
Iteration  9 - Memory usage (RSS):   18.3 Mb; # threads: 28
Iteration 10 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 11 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 12 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 13 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 14 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 15 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 16 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 17 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 18 - Memory usage (RSS):   18.3 Mb; # threads: 31
Iteration 19 - Memory usage (RSS):   18.3 Mb; # threads: 31

Not that big of a deal?

aiofiles is just using the default asyncio executor, which creates up to num_cpu*5 threads. You can override it like this:

import asyncio
import aiofiles
from concurrent.futures import ThreadPoolExecutor

import os
import psutil

async def printMemory():
    for iteration in range(0, 20):

        # grab the memory statistics
        p = psutil.Process(os.getpid())
        vms =p.memory_info().rss / (1024.0*1024.0)
        threads = p.num_threads()
        print(f'Iteration {iteration:>2d} - Memory usage (RSS): {vms:>6.1f} Mb; # threads: {threads:>2d}')

        # simple write to a test file
        async with aiofiles.open('test.txt',mode='w') as f:
            await f.write('hello\n')

        # a wait, just for the sake of it
        await asyncio.sleep(1)

loop = asyncio.get_event_loop()
loop.set_default_executor(ThreadPoolExecutor(1))

try:
    loop.run_until_complete(printMemory())
finally:
    loop.close()

then the results will be:

> python test.py
Iteration  0 - Memory usage (RSS):   17.6 Mb; # threads:  1
Iteration  1 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  2 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  3 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  4 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  5 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  6 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  7 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  8 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration  9 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 10 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 11 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 12 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 13 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 14 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 15 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 16 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 17 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 18 - Memory usage (RSS):   17.9 Mb; # threads:  2
Iteration 19 - Memory usage (RSS):   17.9 Mb; # threads:  2