farcepest / MySQLdb1

MySQL database connector for Python (legacy version)
https://sourceforge.net/projects/mysql-python/
666 stars 318 forks source link

memory leak with connection objects #42

Open regilero opened 10 years ago

regilero commented 10 years ago

Hello,

I was searching some memory leaks, so I used the /tests/test_MySQLdb_capabilities.py file on this project, and set the leak_test attribute to True inside. And every tests was leaking.

So I made some other tests, with basic code such as:

    connection = MySQLdb.connect(**connect_kwargs)
    cursor = connection.cursor()
    connection.ping()
    cursor.close()
    connection.close()

Or

    with MySQLdb.connect(**connect_kwargs) as cursor:
        cursor.connection.ping()

And everytime gc was reporting some unreachable garbage:

[<cell at 0x7f9df6ad8868: weakproxy object at 0x2709f18>, <cell at 0x7f9df6ad87c0: function object at 0x2732c08>, <cell at 0x7f9df6ad8fd8: function object at 0x2732b90>, <weakproxy at 0x2709f18 to NoneType at 0x917730>, (<cell at 0x7f9df6ad8868: weakproxy object at 0x2709f18>, <cell at 0x7f9df6ad8fd8: function object at 0x2732b90>), <function funicode_literal at 0x2732b90>, (None,), (<cell at 0x7f9df6ad87c0: function object at 0x2732c08>,), <function fstring_decoder at 0x2732c08>]

I am maybe wrong, I can send you my tests if you want, but I think it comes from https://github.com/farcepest/MySQLdb1/blob/master/MySQLdb/connections.py#L200-214 :

       (...inside connection __init__()...)
        db = proxy(self)
        def _get_string_literal():
            def string_literal(obj, dummy=None):
                return db.string_literal(obj)
            return string_literal

        def _get_unicode_literal():
            def unicode_literal(u, dummy=None):
                return db.literal(u.encode(unicode_literal.charset))
            return unicode_literal

        def _get_string_decoder():
            def string_decoder(s):
                return s.decode(string_decoder.charset)
            return string_decoder

Here you define some closure inside the init, and some closures in the closures, then you use a self proxy weak reference on the connection object. Strange code. But it seems my python 2.7 does not like this double closure using a weak reference on the object, and does not know how to remove it on cleanup.

I tried to alter it to return weak proxy reference on the functions, then it did not work anymore as theses functions were cleared right after the init. So I also added some hacked attributes on the connection object to increment the ref counter. And it worked, on connection removal theses attribute were automatically set to none, and the weak ref on theses functions prevented the unreachable garbage:

       (...inside connection __init__()...)
        db = proxy(self)
        def _get_string_literal():
            def string_literal(obj, dummy=None):
                return db.string_literal(obj)
            # adding fake reference counter on the connection object
            db.fake_refcount1 = string_literal
            # returning a weak reference on this function
            return proxy(string_literal)

        def _get_unicode_literal():
            def unicode_literal(u, dummy=None):
                return db.literal(u.encode(unicode_literal.charset))
            db.fake_refcount2 = unicode_literal
            return proxy(unicode_literal)

        def _get_string_decoder():
            def string_decoder(s):
                return s.decode(string_decoder.charset)
            db.fake_refcount3 = string_decoder
            return proxy(string_decoder)

This is an ugly patch, but at least if fixed the leak.

66Ton99 commented 9 years ago

Thank you, but for my case this solution does not fit.

regilero commented 9 years ago

@66Ton99, you're saying too much things... or not enough. What is your case? And why does this not fit?

methane commented 9 years ago

Unreachable is not leak. Uncollectable is. This strange code avoids connection object participates to circular reference. Since connection object has __del__ method, it is uncollectable when it is in circular reference.

I think current design of converter is very bad. Converters should receive connection as argument.

66Ton99 commented 9 years ago

@regilero my case was 34 000 rows with .fetchall() and multiple calls at the same time which move all processes to SWAP and do not release memory on the end. But if some other process will request memory Python will release it. So I think problem not in MySQLdb