closeio / sync-engine

GNU Affero General Public License v3.0
25 stars 9 forks source link

Newer versions of MySQL driver dropped support for gevent #677

Open nsaje opened 7 months ago

nsaje commented 7 months ago

The mysqlclient driver that we're using has removed gevent support in version 1.4.

The last version of the driver that supports it (1.3.4) supports Python up to 3.9 so this is blocking us from supporting newer Python versions and missing out on any potential improvements.

nsaje commented 7 months ago

@squeaky-pl could we just switch to using the pure-Python driver PyMySQL ? It's by the same maintainer as mysqlclient and he recommends using PyMySQL instead of mysqlclient if you use gevent.

squeaky-pl commented 7 months ago

This is one of possible solutions.

We would need to measure performance impact. sync-engine's sync pods have a pretty tight loop where it polls all the IMAP uids in the database for a given account against the remote IMAP uids. For large email inboxes it can be easily hundreds of thousands of uids fetched and decoded (this time in Python instead of C) in one SQL query.

We still have semi-broken data with strings that are not valid in Python 3 that were inserted into the database with Python 2 with surrogates, MySQL doesn't validate strings against invalid surrogate pairs and Python 3 does. we are currently correcting this on the fly.

See: https://github.com/closeio/sync-engine/blob/c064bf8f55bc190298b7f43504442c8f535089a4/inbox/sqlalchemy_ext/util.py#L281-L320 and

We would need to see if this approach can be replicated on the new driver, or resync all the data that was synced with Python 2 and reinsert it with surrogate pairs corrected (fun project apart).

There are also some fun and weirds hacks here that access internal mysql C driver modules. See https://github.com/closeio/sync-engine/blob/c064bf8f55bc190298b7f43504442c8f535089a4/inbox/util/concurrency.py#L6-L87. Not sure what this is even for but maybe we could just remove it.

squeaky-pl commented 7 months ago

If we find ourselves under time pressure we could also move to a distro with Python 3.9. Sync-engine works on Python 3.9 unmodified and it's even in the CI now. That would buy as one more year if using non LTS distro is an option. Ubuntu 21.04 shipped with 3.9.

squeaky-pl commented 7 months ago

@tsx was also talking to me about getting rid of Gevent first, which might be a smaller project and would align with the direction we are taking with other projects.

squeaky-pl commented 7 months ago

Also putting this on @wojcikstefan radar.

tsx commented 7 months ago

We still have semi-broken data with strings that are not valid in Python 3 that were inserted into the database with Python 2 with surrogates, MySQL doesn't validate strings against invalid surrogate pairs and Python 3 does. we are currently correcting this on the fly.

Sounds like this can be fixed with a standalone migration? We could come up with a clever enough select ... where field like '%surrogate%' and date_created < $python3_migration and then fix and rewrite those rows directly.

resync all the data that was synced with Python 2 and reinsert it with surrogate pairs corrected

I know it'll be more complex than a couple queries, but could be way easier than full resyncs.