BritishGeologicalSurvey / etlhelper

ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
https://britishgeologicalsurvey.github.io/etlhelper/
GNU Lesser General Public License v3.0
100 stars 25 forks source link

Allow setting default schema #189

Open rbroth opened 7 months ago

rbroth commented 7 months ago

In oracle, when you write sql code targeting a db object without specifying it's schema, oracle assumes that is the same as the username e.g. if user alice connects to the db and executes SELECT * FROM mytable, oracle will interpret it as SELECT * FROM alice.mytable. If Alice wants her script to work with both dev_hr and prod_hr, she either has to prefix all of her object names with a variable (leading to clunky code), or run

with ORACLEDB.connect(password_variable='DB_PASSWORD') as conn:
    etl.execute("ALTER SESSION SET CURRENT_SCHEMA=dev_hr", conn)`

every time she opens up a db connection.

It is possible to set the default schema for a session, so that you can specify what schema gets set before an unqualified db object name (see e.g. https://python-oracledb.readthedocs.io/en/latest/api_manual/connection.html#Connection.current_schema).

It would be useful if we added an attribute for DBParams that would allow setting the default schema, e.g.:

ORACLEDB = etl.DbParams(
    dbtype='ORACLE',
    host=os.getenv('DB_HOST'),
    port=1521,
    dbname=os.getenv('DB_NAME'),
    user='alice',
    current_schema='dev_hr'
)
volcan01010 commented 7 months ago

Hi @rbroth,

That would be a nice enhancement. I've have to do similar things in PostgreSQL before. We'd need to check the implementation for there, and what we need in SQLite (when you have multiple databases connected that act like schemas) and in MS SQL before we could add it.

In the meantime, the OracleDB docs suggest that the current_schema attribute is read/write, so it should be possible to shorten your code by:

with ORACLEDB.connect(password_variable='DB_PASSWORD') as conn:
    conn.current_schema = "dev_hr"
    ...

You can save setting it each time by defining a function to connect and update the schema in one step:

from contextlib import contextmanager

@contextmanager
def connect_to_schema(db_params, password_variable, schema):
    try:
         conn = db_params.connect(password_variable=password_variable)
         conn.current_schema = schema
         yield conn
    finally:
         conn.close()

# Example usage
with connect_to_schema(ORACLE_DB, "DB_PASSWORD", "dev_hr") as conn:
    result = etl.fetchall("SELECT * FROM my_table", conn)