crate / crate-clients-tools

Clients, tools, and integrations for CrateDB.
https://crate.io/docs/clients/
Apache License 2.0
3 stars 1 forks source link

Problem running MLFlow with CrateDB #47

Closed amotl closed 10 months ago

amotl commented 1 year ago

Hi there,

@andnig recently evaluated MLflow with CrateDB. Thanks!

MLflow can be used to track ML experiments - see here for more details. Conveniently, it allows to use a PostgreSQL database backend. In theory, we would be able to use CrateDB as MLflow backend as well, removing the need to have yet another database in our MLOps environments.

However, when running the MLflow server, the following exception is raised:

2023/07/31 09:02:39 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(psycopg2.errors.InternalError_) line 1:1: mismatched input 'ROLLBACK' expecting

Reproduce

  1. Install mlflow and psycopg2 packages. pip install mlflow 'psycopg2[binary]'

  2. Run mlflow server --backend-store-uri='postgresql://<crate-user>:<crate-pwd>@<crate-instance>.azure.cratedb.net:5432' --host 0.0.0.0 --port 8885

With kind regards, Andreas.

amotl commented 1 year ago

Thanks for the report, @andnig.

@ckurze recently mentioned that the legacy CrateDB JDBC driver actually provides this very feature of making ROLLBACK a no-op. I think we should bring it to the Python world, for the very same reasons:

Also, tools like Dataiku need this driver to implement transaction commands like ROLLBACK as a no-op.

-- https://crate.io/docs/jdbc/en/latest/#overview

@andnig: May I ask if using the native CrateDB SQLAlchemy driver and dialect would also work, or if not, why? I am referring to pip install 'crate[sqlalchemy]', and then using mlflow server --backend-store-uri='crate://', if that would work at all?

amotl commented 1 year ago

@andnig responded with:

@amotl No there is no specific reason for not to use sqlalchemy. Happy to try this one. Would you mind providing your input on when you'd suggest to use sqlalchemy and when to use your native client? Can you guide me, how to set this no-op? DB connection strings are one of my weak spots 😄

amotl commented 1 year ago

When installing with pip install mlflow 'crate[sqlalchemy]', and running mlflow server --backend-store-uri='crate://crate@localhost', MLflow fails differently.

mlflow.tracking.registry.UnsupportedModelRegistryStoreURIException: Model registry functionality is unavailable; got unsupported URI 'crate://crate@localhost:5432' for model registry data storage. Supported URI schemes are: ['', 'file', 'postgresql', 'mysql', 'sqlite', 'mssql']. See https://www.mlflow.org/docs/latest/tracking.html#storage for how to run an MLflow server against one of the supported backend storage locations.

amotl commented 1 year ago

After a quick patch to MLflow, it will accept the crate:// database connection string URL scheme.

--- .venv/lib/python3.9/site-packages/mlflow/store/db/db_types.py.dist  2023-09-05 17:50:58.000000000 +0200
+++ .venv/lib/python3.9/site-packages/mlflow/store/db/db_types.py   2023-09-05 17:51:01.000000000 +0200
@@ -6,5 +6,6 @@
 MYSQL = "mysql"
 SQLITE = "sqlite"
 MSSQL = "mssql"
+CRATEDB = "crate"

-DATABASE_ENGINES = [POSTGRES, MYSQL, SQLITE, MSSQL]
+DATABASE_ENGINES = [POSTGRES, MYSQL, SQLITE, MSSQL, CRATEDB]

That reveals a subsequent exception when running DDL statements: mismatched input 'PRIMARY KEY' expecting 'CHECK'.

2023/09/05 17:50:08 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2023/09/05 17:50:08 ERROR mlflow.cli: Error initializing backend store
2023/09/05 17:50:08 ERROR mlflow.cli: (crate.client.exceptions.ProgrammingError) SQLParseException[line 7:27: mismatched input 'PRIMARY KEY' expecting 'CHECK']
[SQL:
CREATE TABLE experiments (
    experiment_id INT NOT NULL,
    name STRING NOT NULL,
    artifact_location STRING,
    lifecycle_stage STRING,
    CONSTRAINT experiment_pk PRIMARY KEY (experiment_id),
    CONSTRAINT experiments_lifecycle_stage CHECK (lifecycle_stage IN ('active', 'deleted')),
    UNIQUE (name)
)

]
amotl commented 1 year ago

Hi again,

I have been able to work around those shortcomings by adding a few visitor methods to CrateDDLCompiler [^1], in order to mask relevant details. While database tables have been created successfully, with partly valid schemas, the next step is that MLflow will use Alembic to upgrade the database schema using the SQL migration scripts located at db_migrations/versions.

This will need a corresponding database-specific adapter implementation for Alembic to run migration operations, which includes database-specific behavioral variances, specific to CrateDB. Using a minimal variant based on the implementation for PostgreSQL works around having a real one.

class CrateDbImpl(PostgresqlImpl):
    __dialect__ = "crate"
    transactional_ddl = False

Unfortunately, the process fails on this occasion:

ALTER TABLE metrics ADD COLUMN step LONG DEFAULT '0' NOT NULL

because

adding a column with a DEFAULT clause is not supported with ALTER TABLE statements.

-- https://crate.io/docs/crate/reference/en/5.4/sql/statements/alter-table.html#add-column

With kind regards, Andreas.

[^1]: visit_check_constraint, visit_column_check_constraint, visit_primary_key_constraint, visit_foreign_key_constraint, visit_unique_constraint

amotl commented 1 year ago

Down this rabbit hole, there are too many features typically offered by OLTP databases, which would be needed to properly support this subsystem of MLflow. So, this is most probably a dead end for now.

amotl commented 1 year ago

Hi again,

we took a different approach by patching a bit of MLflow's main code base, and got it working reasonably.

Thank you so much for providing corresponding starter resources, @andnig. Without them, this would not have been possible.

Now, I am humbly asking all interested people to exercise this stack, in order to discover eventual flaws and shortcomings. Thanks!

With kind regards, Andreas.

amotl commented 1 year ago

We took a different approach by patching a bit of MLflow's main code base, and got it working reasonably.

After leveling up a bit more, we converged the patches into monkeypatching style, and put them into an installable package, in order to make the system consumable out of the box, using pip install mlflow-cratedb. Enjoy.

seut commented 10 months ago

This is done after talking with @amotl.