psycopg.Connection instance created by DatabaseJanitor's _loader() omits configuration parameters

michaeljsnyder commented 8 months ago

Thanks for the great package--I might be missing something, but am hitting an issue with one of my current use cases. I have an .sql file that I'm trying to use to populate a running database with as part of an integration test, so I'm using postgresql_noproc. The .sql file requires autocommit to be turned on, but I don't see a way to enforce this with the way DatabaseJanitor is currently structured.

My simplified Python code:

from pytest_postgresql import factories

postgresql_my_noproc = factories.postgresql_noproc(
  load=["C:/test.sql"]
)
postgresql_my = factories.postgresql("postgresql_my_noproc")

def test_database(postgresql_my):
  # Just invoke the fixture's SQL loading.
  assert 1 == True

Inside pytest.ini I have my database information specified:

postgresql_host=foo
postgresql_port=5432
postgresql_user=user
postgresql_password=pass

And my test.sql file has, along with a lot of other content:

CREATE DATABASE test;

Running this results in: psycopg.errors.ActiveSqlTransaction: CREATE DATABASE cannot run inside a transaction block, which makes sense.

Ideally I'd be able to specify the additional kwargs I want to pass to psycopg.Connection() such as autocommit=True (they default it to False) via the postgresql_options key in pytest.ini (or via some other ini parameter, if there's a more suitable one), but these configurations are not passed to the connection constructor.

DatabaseJanitor calls _loader() here which then ultimately calls psycopg.connect() here, but only the host, port, user, dbname, and password arguments are supplied, so there's no way to provide additional configuration to psycopg.

Is there another way to supply a psycopg.Connection instance to the DatabaseJanitor such that I can configure it before the .sql file loading occurs, or can we adjust the loading functionality to pass through more arguments from pytest.ini?

fizyk commented 8 months ago

@michaeljsnyder you'd have to actually have custom loader function and pass it to the fixture instead of the SQL file path. basically copy this function https://github.com/ClearcodeHQ/pytest-postgresql/blob/pypy/pytest_postgresql/sql.py#L7 And having the SQL filename hardcoded in the function instead of passed through Tests have an example: https://github.com/ClearcodeHQ/pytest-postgresql/blob/main/tests/loader.py#L9 this function is used ie https://github.com/ClearcodeHQ/pytest-postgresql/blob/main/tests/docker/test_noproc_docker.py#L17

fizyk commented 8 months ago

Out of curiosity, why are you creating additional database in there?

michaeljsnyder commented 8 months ago

@michaeljsnyder you'd have to actually have custom loader function and pass it to the fixture instead of the SQL file path. basically copy this function https://github.com/ClearcodeHQ/pytest-postgresql/blob/pypy/pytest_postgresql/sql.py#L7 And having the SQL filename hardcoded in the function instead of passed through Tests have an example: https://github.com/ClearcodeHQ/pytest-postgresql/blob/main/tests/loader.py#L9 this function is used ie https://github.com/ClearcodeHQ/pytest-postgresql/blob/main/tests/docker/test_noproc_docker.py#L17

Ahh, thanks very much @fizyk -- I saw the note in the docs about the load reference being a function but wasn't entirely sure how to use that approach. Your examples above cleared it up for me and everything is working as expected on my end now. Much appreciated.

In case it's helpful for others, this is what I ended up with:

from pathlib import Path
from typing import Any
import os, importlib_resources

import psycopg
from pytest_postgresql import factories

# Catalog references to SQL files. Replace \\ with / for Windows path compatibility.
sql_path = str(importlib_resources.files("my_app.config.database_schemas"))
sql_files = [str(Path(os.path.join(sql_path, schema))).replace("\\\\", "/") for schema in os.listdir(sql_path) if schema.endswith(".sql")]

def load_database_with_autocommit(**kwargs: Any) -> None:
  """
  Re-implement the native pytest_postgresql SQL loader to enforce autocommit=True. The host, port, user, dbname, and
  password are supplied automatically via kwargs.

  The set of SQL files to load is pre-cataloged outside this function.
  """
  db_connection = psycopg.connect(autocommit=True, **kwargs)

  for sql_filename in sql_files:
    with open(sql_filename, "r") as sql_file:
      with db_connection.cursor() as cur:
        cur.execute(sql_file.read())

  db_connection.commit()

# Initialize the test databases.
postgresql_my_noproc = factories.postgresql_noproc(
  load=[load_database_with_autocommit]
)
postgresql_my = factories.postgresql("postgresql_my_noproc")

Out of curiosity, why are you creating additional database in there?

The app that I'm working on tests for owns a handful of .sql files that are used by orchestration to both create a couple databases and initialize their structure in fresh deploy scenarios. To start I wanted to see if I could leave the .sql files unchanged (and thus avoid affecting orchestration and the database team) and essentially use the fixtures here to manage my database setup and connections for unit/integration testing.

I think your question is highlighting that I'm not benefitting from the teardown capabilities of the fixtures (was that the misalignment you were seeing?), because the DatabaseJanitor is not responsible for creating the database that my tests are using. Is there a recipe for the DatabaseJanitor deferring to loaded .sql for the creation, but being provided a database name later to drop/teardown upon test suite completion?

fizyk commented 8 months ago

@michaeljsnyder the immediate solution that comes to my mind would be an autouse session fixture that does the cleaning at teardown. With this, I have an additional question, that might help with my thoughts on the matter going a bit wild:

Are the additional databases you create a part of the SQL file you hand over, or do you need those to test the SQL statements? I'm considering whether the proc fixture could benefit from having a multiple dbnames configured on a single one.

michaeljsnyder commented 8 months ago

Are the additional databases you create a part of the SQL file you hand over, or do you need those to test the SQL statements? I'm considering whether the proc fixture could benefit from having a multiple dbnames configured on a single one.

@fizyk this is a good point, yes the databases that get created are subsequently needed to test the app. I essentially have a set of .sql files which create a few databases like data_a, data_b, and so on, and the app has components that are hard-coded to read and write to these data_a, data_b databases depending on the functionality being invoked. Part of what I'm working through is how best to parameterize and externalize some of these database names and their creation so that they are available to all three of orchestration, the .sql scripts themselves, and the app in a way that the creation is defined once to reduce brittle coupling.

For example, one partial approach could involve me running a regex on the .sql files to find all instances of CREATE DATABASE to pick out the names of the databases being created, and then supply those in some way to the fixtures here so that teardown can be simplified. This is a bit hacky, but gets at the spirit of the capability I think.

A cleaner approach probably involves restructuring my .sql files such that the database creation is separate from the schema, so that the fixtures here fit into the flow a bit better (e.g., omit running the creation scripts in favor of DatabaseJanitor creation but run the schema ones per-database).

ClearcodeHQ / pytest-postgresql

psycopg.Connection instance created by DatabaseJanitor's _loader() omits configuration parameters #902