GrahamDumpleton / wrapt

A Python module for decorators, wrappers and monkey patching.
BSD 2-Clause "Simplified" License
2.03k stars 231 forks source link

Use of ObjectProxy with type() #241

Open stolpovsky opened 1 year ago

stolpovsky commented 1 year ago

This is a question (not an issue).

I am using wrapt.ObjectProxy to modify context manager behavior (__enter__, __exit__) of a database connection object (SqlAlchemy Connection). It works as expected with the SqlAlchemy version 1.3.x, but in 1.4.47 it raises an exception in this piece of code (inspection.py, the comments are mine):

    type_ = type(subject) # <--- type() is used, rather than __class__
    for cls in type_.__mro__: # <--- loops over MRO types: (<class 'util.db.ConnectionProxy'>, <class 'ObjectProxy'>, <class 'object'>)
        if cls in _registrars: # <--- not in
            reg = _registrars[cls]
            if reg is True:
                return subject
            ret = reg(subject)
            if ret is not None:
                break
    else:
        reg = ret = None

    if raiseerr and (reg is None or ret is None):
        raise exc.NoInspectionAvailable(  # <--- Throws
            "No inspection system is "
            "available for object of type %s" % type_
        )

Is there a workaround for this? Thank you.

GrahamDumpleton commented 1 year ago

Is your code online on GitHub somewhere so can see the rest of the code you have for your derived ObjectProxy? Also how that is used would be helpful as well. It is too hard from just this code snippet to understand what you are trying to do.

stolpovsky commented 1 year ago

Not on GitHub, but hopefully this will suffice. If you have a MSSQL database handy, change the database name and the URL and this should be runnable.

import pandas as pd
import urllib
import wrapt
import sqlalchemy

class ConnectionProxy(wrapt.ObjectProxy):

    def __enter__(self):
        try:
            print('enter')
        except LookupError:
            pass
        finally:
            super().__enter__()
            return self

    def __exit__(self, exc_type, exc_value, exc_tb):
        try:
            print('exit')
        except LookupError:
            pass
        finally:
            super().__exit__(exc_type, exc_value, exc_tb)

db_server = {
    'mydb': {
        'prod': 'my.db.host.com',
    },
}

db_conn = {dbname:
    {env: 'Driver={{ODBC Driver 17 for SQL Server}};Server={server};Trusted_Connection=yes;Database={dbname};'.format(
        server=server, dbname=dbname)
        for env, server in m.items()}
    for dbname, m in db_server.items()
}

def get_sqlalchemy_engine(dbname, env):
    url = f'mssql+pyodbc:///?odbc_connect={urllib.parse.quote(db_conn[dbname][env])}'
    engine = sqlalchemy.create_engine(url, fast_executemany=True)
    return engine

def get_sqlalchemy_connection(dbname, env):
    engine = get_sqlalchemy_engine(dbname, env)
    con = engine.connect()
    con.dbname = dbname
    con.env = env

    cp = ConnectionProxy(con)

    return cp

with get_sqlalchemy_connection('mydb', 'prod') as con:
    df = pd.DataFrame({'x': [1, 2, ]})
    df.to_sql('mytable', con, dtype={'x': sqlalchemy.INTEGER})

This is the exception thrown (top of exception stack): File "C:\Python39\lib\site-packages\sqlalchemy\inspection.py", line 71, in inspect raise exc.NoInspectionAvailable( sqlalchemy.exc.NoInspectionAvailable: No inspection system is available for object of type <class '__main__.ConnectionProxy'>

GrahamDumpleton commented 1 year ago

You are probably better off monkey patching the connection object instance because sqlalchemy.inspect() derives information from the direct type of an object and doesn't use __class__ to lookup the type of an object, which arguably it should if wants to be compatible with systems that monkey patch stuff. This is the same reason you should use functions like isinstance() rather than comparing type() result to actual types.

Monkey patching __enter__ and __exit__ special methods is complicated though because they are only ever looked up on the type and so you cannot patch them as attributes on the instance.

The following messy code should work though if adapted:

import wrapt

class Connection:
    def __enter__(self):
        print("Connection::__enter__()")
        return self

    def method(self):
        print("Connection::method()")

    def __exit__(self, exc_type, exc_value, exc_tb):
        print("Connection::__exit__()")

print("Test #1")

with Connection() as c:
    c.method()

def get_connection():
    def enter_wrapper(wrapped, instance, args, kwargs):
        try:
            print("enter_wrapper::try")
            return wrapped(*args, **kwargs)
        finally:
            print("enter_wrapper::finally")

    def exit_wrapper(wrapped, instance, args, kwargs):
        try:
            print("exit_wrapper::try")
            return wrapped(*args, **kwargs)
        finally:
            print("exit_wrapper::finally")

    obj = Connection()

    class MonkeyPatchedConnection(type(obj)):
        pass

    wrapt.wrap_function_wrapper(MonkeyPatchedConnection, "__enter__", enter_wrapper)
    wrapt.wrap_function_wrapper(MonkeyPatchedConnection, "__exit__", exit_wrapper)

    obj.__class__ = MonkeyPatchedConnection

    return obj

print("Test #2")

c = get_connection()

with c:
    c.method()

It relies on overriding the __class__ attribute to be a mocked up type for the connection object which provides the context manager special methods.

I don't recollect ever seeing anything ever modify the __class__ attribute before but apparently it can be done. The wrapt module actually blocks trying to update the __class__ attribute via a proxy object, which now I realise there is a use case for updating it, it probably shouldn't block.

Anyway, summary is that SQLAlchemy is not friendly to using proxy objects. It could be argued that it's inspect() method should not use type(subject) but instead use subject.__class__. Although you might suggest such a change to SQLAlchemy, I sort of doubt you would get far since is a rather special corner case and they are likely not to want to upset things even if all regression tests showed a change as still working.

GrahamDumpleton commented 1 year ago

BTW, doing it this way you don't actually need wrapt as could use:

class Connection:
    def __enter__(self):
        print("Connection::__enter__()")
        return self

    def method(self):
        print("Connection::method()")

    def __exit__(self, exc_type, exc_value, exc_tb):
        print("Connection::__exit__()")

print("Test #1")

with Connection() as c:
    c.method()

def get_connection():
    obj = Connection()

    class MonkeyPatchedConnection(type(obj)):
        def __enter__(self):
            print("MonkeyPatchedConnection::__enter__()")
            return super().__enter__()

        def __exit__(self, exc_type, exc_value, exc_tb):
            print("MonkeyPatchedConnection::__exit__()")
            return super().__exit__(exc_type, exc_value, exc_tb)

    obj.__class__ = MonkeyPatchedConnection

    return obj

print("Test #2")

c = get_connection()

with c:
    c.method()
GrahamDumpleton commented 1 year ago

Hmmm, wrapt doesn't specifically prohibit updating __class__ on proxy objects and should pass it through, but my test earlier didn't work. Not sure why now.

GrahamDumpleton commented 1 year ago

Okay, so the pure Python implementation of wrapt allows updating __class__ attribute, but the C implementation doesn't. I need to work out a fix for that if possible. Some things aren't easy when using Python C APIs.

GrahamDumpleton commented 1 year ago

Nope, should be easy. I just don't provide a setter. Anyway, fixing that will still not help in this case.

stolpovsky commented 1 year ago

Thank you for the suggestions. I ended up using the version without wrapt. I am a bit unsure of the implications of setting __class__ but it works. Do you happen to know why with keyword is implemented this way, namely why it looks up __enter__ and __exit__ on the type? I became aware of this when my attempt to monkey-patch instance methods did not work. I still find it surprising - but there is probably a reason for it.

GrahamDumpleton commented 1 year ago

I didn't know it had to be on the type either. I tried monkey patching the instance first as well. :-(

So have no real idea why it is that way.

mentalisttraceur commented 1 year ago

Looking up magic methods like __enter__ and __exit__ on the type is actually a common pattern in Python's C innards. Lots/most/all of the dunder methods are looked up on the type by the CPython implementation. It's just really poorly publicized, and people regularly assume that dunder methods are mutable on the instances since basically everything else is.

The reason is probably performance: remember that in the C implementation, Python classes/types have dedicated fields ("slots") in their C struct for many (all?) of the dunder methods. If you always look up dunder methods on the type, then looking up f.e. __add__ on an int is a simple memory access in C (that might be literally one machine instruction) - if you don't, then almost every operator has the overhead of searching the instance attribute dictionary.

This kind of optimization for common operations would be very consistent with other decisions made in Python: for example, in the past Guido rejected adding operator overloads on and and or, and the only reason he chose to actually say was that it added extra bytecode overhead of those two operators. I'm also confident I've seen at least one other example of Guido expressing similar levels of optimization as the reason for doing things in early Python design, but I can no longer remember where.