FactoryBoy / factory_boy

A test fixtures replacement for Python
https://factoryboy.readthedocs.io/
MIT License
3.48k stars 392 forks source link

Memory leak of SQLAlchemy instances passed as argument to factories #781

Open bbc2 opened 3 years ago

bbc2 commented 3 years ago

Description

Factory_boy keeps SQLAlchemy instances passed as argument to factories in memory, causing a memory leak and changing what requests SQLAlchemy performs.

The changed behavior for SQLAlchemy, in our case, is additional SELECT statements. That's an issue for us because we count SQL queries in tests to ensure the absence of typical SQL bugs such as the n+1-query problem.

That being said, I would expect the memory leak to potentially cause other issues.

To Reproduce

Run the following with factory_boy 3.0.0 or 3.0.1:

import factory
from sqlalchemy import Column, ForeignKey, Integer, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker

Base = declarative_base()

class Org(Base):
    __tablename__ = "orgs"

    id = Column(Integer, primary_key=True)

class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True)
    org_id = Column(Integer, ForeignKey("orgs.id"))
    org = relationship(Org)

engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

class UserFactory(factory.alchemy.SQLAlchemyModelFactory):
    class Meta:
        model = User
        sqlalchemy_session = session

def main():
    UserFactory(org=Org(id=1))
    session.commit()

    engine.echo = "debug"
    session.query(Org).filter_by(id=1).delete()
    session.commit()

if __name__ == "__main__":
    main()

When running this, you'll see an unexpected SELECT statement performed on the orgs table:

2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine SELECT orgs.id AS orgs_id
FROM orgs
WHERE orgs.id = ?
2020-09-16 15:15:20,656 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:15:20,656 DEBUG sqlalchemy.engine.base.Engine Col ('orgs_id',)
2020-09-16 15:15:20,656 DEBUG sqlalchemy.engine.base.Engine Row (1,)
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine DELETE FROM orgs WHERE orgs.id = ?
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:15:20,657 INFO sqlalchemy.engine.base.Engine COMMIT

The expected output is the following:

2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine DELETE FROM orgs WHERE orgs.id = ?
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine (1,)
2020-09-16 15:18:11,054 INFO sqlalchemy.engine.base.Engine COMMIT

To see the memory leak, replace main by:

def main():
    import gc
    import weakref

    org=Org(id=1)
    org_weak = weakref.ref(org)
    UserFactory(org=org)
    session.commit()
    del org
    gc.collect()
    print(org_weak)

You'll see the following output:

<weakref at 0x7fd11dba6b30; to 'Org' at 0x7fd11dd7cb80>

That shows that Org is still in memory although it should have been collected.

Expected output:

<weakref at 0x7f34150d2c20; dead>

Notes

I bisected the problem to PR #658. More precisely, it seems to be the assignment cls._original_params = params in the _generate method that causes the memory leak. Therefore, the problem was introduced in version 3.0.0.

I suspect the problem is also present in the Django model factory.

rbarrois commented 3 years ago

Hi! Thanks a lot for the bug report; this is indeed a nasty bug.

The memory leak should indeed be present with Django as well, but won't have as big an impact there. I'll look into it.

bbc2 commented 2 years ago

I tested this again but with more recent versions of SQLAlchemy and found the following:

As a result, the impact of this issue is much lower for us, since we only use factory_boy in tests, but I suppose the extra memory reference could still be an issue for other users.