conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
145 stars 50 forks source link

Discussion: Should conda-store move away from SQLAlchemy? #613

Closed dcmcand closed 1 year ago

dcmcand commented 1 year ago

Context

There have been a number of issues identified using SQL Alchemy. I would like to open the discussion about the advantages and disadvantages of moving away from using SQL Alchemy. This would likely involve abstracting the database functions behind an interface to allow separate implementations for MySQL, PostgreSQL and SQLite.

@costrouc could you link to some of the related issues that you were mentioning.

Value and/or benefit

This could potentially increase performance and solve some of the issues that have identified, but it could potentially increase the complexity and maintenance overhead.

Anything else?

No response

fangchenli commented 1 year ago

Some insight from pandas: Officially, pandas only supports SQLAlchemy. But it has a simple "fall back" sql class for DBAPI 2 connection. Originally it was designed for interaction with the sqlite3 module in the standard library. But surprisingly, many people have been using it with other DBAPI 2 compatible drivers for many years. So depends on how conda-store uses sql database, writing a generic interface to sql db might be easier than people think.

Another approach is to use ADBC drivers. pandas have one PR in progress: https://github.com/pandas-dev/pandas/pull/53869. I don't know too much about ADBC, but it looks promising.

nkaretnikov commented 1 year ago

There have been a number of issues identified using SQL Alchemy.

This is the important bit. To me, it's not yet clear why we need to replace SQLAlchemy and what we're trying to solve. Chris did mention a few things, but those were hypothetical. It's not yet clear whether it's SQLAlchemy or something else.

One thing that was also mentioned is being able to write our own queries for perf critical sections of the code. But we can already do that.

So far, I'm not aware of any real issues with SQLAlchemy, but I haven't looked at some problems related to builds yet, where this might end up appearing.

What I'm definitely not a fan of is doing a rewrite just because we can. It requires time, we might introduce additional bugs while doing so, etc.

dcmcand commented 1 year ago

I think we don't currently have a compelling enough reeason for this change based on this discussion. We can re-examine in the future if we need to.