astronomer / astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
https://astro-sdk-python.rtfd.io/
Apache License 2.0
350 stars 43 forks source link

Users should be able to set `tmp_schema` for each connection #304

Open pgzmnk opened 2 years ago

pgzmnk commented 2 years ago

Please describe the feature you'd like to see Currently tmp_schema must exist on the database to which a SQL tasks will write output to. Feedback sessions with Astro Build surfaced that requiring a specific schema with write access to exist might be an unreasonable ask for a db admin.

There's a feature to set the temp schema with the AIRFLOW__ASTRO__SQL_SCHEMA environment variable. However, forcing the same temp schema name for all conns is untenable. For example, users might want conn1 to use the tmp_schema and conn2 to use a user defined schema named default_schema.

Describe the solution you'd like A possible solution could be to let users specify the temp schema for each conn in a new field in 'extras' of the conn URI.

This means that SQL Tasks write temp tables to the schema defined by the e.g. tmp_schema attribute in the connection's 'extras' section, if that attribute exists. Otherwise write to the default temp schema.

Additional context This issue was encountered within the Astro Build UI.

Acceptance Criteria

tatiana commented 2 years ago

That's excellent feedback, @pgzmnk!

A PR in progress (#140) addresses the same issue but uses a different approach. It allows users to define default values per DAG instead of relying on Astro 0.7.x temporary schema environment variable. What are your thoughts about this alternative approach?

I like your proposal because it is focused on the database, as opposed to DAGs - it is more flexible and would allow a single DAG powered by Astro to interact with >1 database, which can have different temporary schemas.

pgzmnk commented 2 years ago

This makes sense. Let's see what user feedback we get based on your proposed implementation.

Thank you for being on top of this.

tatiana commented 2 years ago

@pgzmnk, One of the challenges in implementing the feature you describe, is that Postgres expects the connection extra property only to contain pre-defined Postgres connection fields:

So when we give it a connection that has an extra containing another field (e.g. tmp_schema), the hook assumes it should be used to establish the connection to Postgres (e.g., as a TTS parameter) and fails to connect.

We'll look into alternatives to make this work!

tatiana commented 2 years ago

For those who may be interested, the issues that happened with Postgres when trying to use this approach are documented in the branch temp-schema-from-conn: https://github.com/astronomer/astro-sdk/commit/8b0dfecb41a036bd849c185ec49377c7938cb5db

kaxil commented 2 years ago

@feluelle @sunank200 Any updates on this issue.

This was brought up again internally at https://astronomer.slack.com/archives/C02B8SPT93K/p1663278078864369