Revolution1 / etcd3-py

Pure python client for etcd v3 (Using gRPC-JSON-Gateway)
Other
105 stars 25 forks source link

Swagger cache fills itself up without boundaries #141

Open jchorin opened 4 years ago

jchorin commented 4 years ago

Description

My setup:

I am using an API with aiohttp. For every request received, an AioClient is created by an aiohttp middleware. The client is closed after the request has been handled.

My issue:

If too many requests are sent to the API, the memory footprint of the API process increases continuously, until my machine breaks and resets.

What I Did

Here is a minimal example. It connects to an etcd database running locally, with ~20 elements present at the prefix "/".

import asyncio
from etcd3 import AioClient

async def read_db():
    while True:
        client = AioClient()
        try:
            resp = await client.range("/")
        finally:
            await client.close()

async def all_run(concurrent=10):
    """Run many reads concurrently
    """
    await asyncio.gather(
        *(read_db() for i in range(concurrent)),
        return_exceptions=False,
    )

def main():
    loop = asyncio.get_event_loop()
    try:
        result = loop.run_until_complete(all_run())
    except asyncio.CancelledError:
        pass
    finally:
        loop.close()

main()

This script, when running, uses more than 1 Go of memory after only 5 minutes.

Workaround

I narrowed down the issue to the caches of SwaggerNode and SwaggerSpec.

By changing the function read_db in the above example like the following:

import asyncio

from etcd3 import AioClient
from etcd3.swagger_helper import SwaggerSpec, SwaggerNode

counter = 0

async def read_db():
    global counter
    while True:
        counter += 1
        client = AioClient()
        try:
            resp = await client.range("/")
        finally:
            await client.close()

        if counter % 20 == 0:
            # Empty the different caches every 20 reads
            SwaggerNode._node_cache = {}
            SwaggerNode.__getattr__.__wrapped__.cache = {}
            SwaggerSpec._ref.__wrapped__.cache = {}
            SwaggerSpec.getPath.__wrapped__.cache = {}
            SwaggerSpec.getSchema.__wrapped__.cache = {}

my memory footprint is kept at 120 Mo even after 20 minutes.

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.90. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

jchorin commented 4 years ago

If I modify the previous example to use the same client everywhere, I do not have any issue with the memory footprint being too high.

See the following modifications:

import asyncio
from etcd3 import AioClient

async def read_db(client):
    while True:
        resp = await client.range("/")

async def all_run(concurrent=10):
    """Run many reads concurrently
    """
    client = None
    try:
        client = AioClient()

        await asyncio.gather(
            *(read_db(client) for i in range(concurrent)),
            return_exceptions=False,
        )
    finally:
        if client:
            await client.close()

def main():
    loop = asyncio.get_event_loop()
    try:
        result = loop.run_until_complete(all_run())
    except asyncio.CancelledError:
        pass
    finally:
        loop.close()

main()

However, would it be best practice to use the same AioClient for a web server with several concurrent requests?

Revolution1 commented 4 years ago

That might be the solution.

But I'll have to dig into it to find the cause.

Actually the "swagger cache" was just a temporary solution. My goal is to auto generate data model class from the swagger spec. Instead of generate model class at runtime. That, the cache won't be a problem