dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.69k stars 1.48k forks source link

Feature request: Allow specifying the database `schema` of Dagster storage config #25312

Open vergenzt opened 4 weeks ago

vergenzt commented 4 weeks ago

Re: https://docs.dagster.io/deployment/dagster-instance#dagster-storage

What's the use case?

My company has an existing ad hoc job execution framework written in Python, which stores its execution logs in a Postgres database.

Given that dagster manages its own schema and table names, etc., I'd prefer to give Dagster storage tables a separate namespace from our main database schema. However since I'm hoping we can migrate from the old execution framework to the new, I'd love to have them both live in the same Postgres database instance, so that I can run queries against both the old (ad hoc framework) and new (Dagster) storage tables at the same time.

It'd be nice if schema or db_schema could be an allowed parameter in storage config!

Ideas of implementation

  1. Add schema or db_schema property to the pg_database config at python_modules/dagster/dagster/_core/storage/config.py#L52-L69.

    Maybe also recognize a ?schema=... or ?db_schema= param in postgres_url? (It would need to be stripped out since according to PostgreSQL: Documentation: 17: 32.1. Database Connection Control Functions — 32.1.1.2. Connection URIs custom unrecognized query parameters in a connection URL are not supported.)

  2. All calls to SQLAlchemy's create_engine should be updated to pass a schema_translate_map translating None (the unspecified schema) throughout model references to the config value that was passed in.

    This is an approachable number of places to change (and they could probably consolidated into a create_engine helper method). See search: https://github.com/search?q=repo%3Adagster-io%2Fdagster+create_engine%28+NOT+path%3Aexamples&type=code

Additional information

I would love to help and/or implement this! I wanted to check if you're open to it first though and/or have other concerns.

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

vergenzt commented 3 weeks ago

This has been mentioned before at https://github.com/dagster-io/dagster/discussions/18207#discussioncomment-10004148

vergenzt commented 3 weeks ago

I started a WIP draft PR for this at #25366!

dduong1603 commented 2 weeks ago

you might want to try attaching options=-csearch_path=<schema name> to the URI