galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.34k stars 974 forks source link

SQLAlchemy 2.0 compatibility #12541

Open jdavcs opened 2 years ago

jdavcs commented 2 years ago

This builds on #10892 (SQLAlchemy 1.4 compatibility) and details the specific steps required for preparing Galaxy's codebase for migrating to SQLAlchemy's upcoming 2.0 release. The steps are documented here.

  1. First Prerequisite, step one:

    • [x] A working 1.3 Application (ref)
  2. First Prerequisite, step two:

    • [x] A Working 1.4 Application (ref):
      • [x] Review the significant changes in 1.3 > 1.4 outlined here. (the first - URL object is now immutable - has been addressed)
      • [x] Review and try to remove the root causes for currently emitted SAWarnings (there may be just a handful causes that lead to numerous warnings), unless those causes are non critical and would require an inordinate amount of effort to resolve
  3. Migration to 2.0 Step One:

    • [x] Python 3 only (Python 3.6 minimum) (ref).
  4. Migration to 2.0 Step Two Prerequisites (Galaxy-specific)

    • [x] Switch to Alembic (#10369) before resolving all warnings.
  5. Migration to 2.0 Step Two - Use RemovedIn20Warnings (ref)(#13171) (To enable warnings, set GALAXY_CONFIG_SQLALCHEMY_WARN_20=1)

    • [x] Setup a warnings filter to handle RemovedIn20Warnings
    • [x] Enable the SQLALCHEMY_WARN_20=1 environment variable for runtime:
      • [x] in local dev
      • [x] on main
    • [x] Enable the SQLALCHEMY_WARN_20=1 environment variable for all tests:
      • [x] local dev (subsets of tests can be run manually with the flag set)
      • [x] all relevant testing workflows
  6. Migration to 2.0 Step Three - Resolve all RemovedIn20Warnings (ref) (Helpful documentation on how to filter warnings: sqlalchemy, pytest, python)

    • [x] Resolve the underlying cause for each warning that suggests a future runtime error: add test exposing the issue (if possible within reason), fix it. (NOTE: Do not silence the warning before fixing the underlying cause.). See these examples of changes that may potentially have a significant impact on a running Galaxy:
  7. Migration to 2.0 Step Four - Use the future flag on Engine (ref)

    • [x] Pass the future=True flag to the create_engine() function (there are multiple places where we call that function: check entire code base: lib/, scripts/, test/)
    • [x] Consider using connection.commit() where applicable
    • [x] Resolve any new errors or deprecation warnings
  8. Migration to 2.0 Step Four - Use the future flag on Session (ref)

    • [x] Pass the future=True flag to Session wherever we are creating an instance (direct instantiation of Session or calling the sessionmaker factory)
    • [x] Resolve any new errors or deprecation warnings
  9. Once all of the above is complete and no RemovedIn20Warning are emitted:

    • [x] Review the documentation that details specific changes that are to be made to the code base with respect to all major API modifications (What's new in 2.0)
  10. Enable SA 2.0:

    • [x] Update pyproject.toml, relevant packages (setup.cfg files)
    • [x] Fix dependency conflicts:
      • [x] fastapi-utils (#17185)
      • [x] graphene-sqlalchemy
    • [x] Fix model attribute typing errors
    • [x] Fix any other errors
  11. Cleanup:

  12. Next steps:

    • [ ] Refactor data access code to take advantage of SQLAlchemy 2.0 improvements

UPDATE: Keeping track of fixing RemovedIn20Warnings:

Number of distinct warnings per github workflow (each may be triggered multiple times by different tests): (last updated/corrected: Dec 11, 2023)

tests 10/26 11/15 11/29 12/04 12/11
api 90 85 85 17 0
dbindexes 6 1 1 0 0
framework 2 1 1 1 0
integration 326 165 142 37 0
integration sel 66 33 31 12 0
selenium 200 100 80 23 0
toolshed 2 1 1 1 0
unit 2 1 1 1 0
TOTAL 694 420 342 92 0

UPDATE: Keeping track of fixing test failures on the SQLAlchemy-2.0 PR (#17180):

(last updated/corrected: Feb 23, 2024)

tests 12/19 12/20 12/21 1/12 1/17 1/18 1/24 1/25 2/12 2/23
integration 10 10 10 10 10 10 10 10 0 0
integration sel 1 0 0 0 0 0 0 0 0 0
toolshed 75 75 75 75 75 75 75 75 75 0
unit w/postgres 32 0 0 0 0 0 0 0 0 0
mypy 1178 911 779 702 538 392 243 173 0 0
TOTAL 1296 996 864 787 623 477 328 258 75 0
jdavcs commented 2 years ago

Running test/unit/data/model/test_mapping.py with the SQLALCHEMY_WARN_20=1 flag triggers 1561 RemovedIn20Warning warnings. It is likely not as problematic as it looks: the set of root causes that account for these warnings is, I'm sure, much smaller and, therefore, manageable. However, I think it makes sense to address these by enabling the flag manually for each test run: otherwise the warnings would pollute normal output.

jdavcs commented 2 years ago

So, we are switching to Alembic before this can be completed. I tried fixing the bound metadata issue (one of the "things" that are removed in 2.0), and that appears to be not possible with our current migration provider (e.g. a bound engine is assumed when SA Migrate executes Column.create(). So, Alembic comes first.

jdavcs commented 2 years ago

In 2.0 there will be no backref/back-populates cascading of child objects into the session (ref). In particular: "In 2.0, the default behavior will be that “cascade_backrefs” is False, and additionally there will be no “True” behavior as this is not generally a desirable behavior." This, potentially could wreak havoc on a running Galaxy. To address this, I will try adding an explicit cascade_backref=False to all relationships with back_populates and address any issues on a case-by-case basis.

Once we get to Step Four and enable the future=True flag on the Session, I'll remove these attributes. I think this is the safest way to do it (and manageable too).