cmu-delphi / delphi-epidata

An open API for epidemiological data.
https://cmu-delphi.github.io/delphi-epidata/
MIT License
100 stars 62 forks source link

Update database drivers #1355

Open melange396 opened 7 months ago

melange396 commented 7 months ago

We pin the version on 3 database-specific (MySQL) packages. They are all getting long in the tooth. Updating them could give us better performance, but might also cause problems or require non-trivial changes:

melange396 commented 6 months ago

I went looking for places these db packages are actually referenced/used...

pymysql seems vestigial and doesnt appear to be used anywhere, and thus should probably be removed.

In all of our python ( https://github.com/search?q=repo%3Acmu-delphi%2Fdelphi-epidata+%2Fmysql%2F+language%3Apython&type=code ), we really only use mysql-connector/mysql-connector-python aka mysql.connector. This amounts to just the acquisition and test code.

In non-python ( https://github.com/search?q=repo%3Acmu-delphi%2Fdelphi-epidata+%2Fmysql%2F+-language%3Apython&type=code ), the only places we see mysqlclient aka mysqldb are in requirements files and as a 'dialect' in SQLAlchemy DBURIs.

It seems silly to use two different adapters/drivers across this single repository. I was going to suggest settling on using only mysql-connector because it would not require changing code and could be done with a very small diff, but the SQLAlchemy documentation has some scary-sounding commentary on it AND it was mysql-connector that was in play when i noticed the behavior that caused me to file this issue in the first place (see https://github.com/cmu-delphi/delphi-epidata/pull/1356). That makes me think it may be safer in the long run to have all of our stuff use mysqlclient instead. The programming interface for mysqlclient (at a glance) looks fairly similar to what we are already using, so it might be enough to just find and replace import mysql.connector --> from MySQLdb import _mysql as mysqldb and then mysql.connector --> mysqldb but realistically, all of the usages of the module in our repo should be individually checked for compatibility.

So, at this point, we can either:

melange396 commented 6 months ago

After all of the SQLAlchemy talk, it occurs to me that its probably worth it to bump its version too. The version we are currently using (1.4.40) is about 1.5y old. There is a new major release, but also very recent maintenance releases to the minor version we are already using. Our uses of the package are fairly simplistic, so it is probably low risk to attempt the major.