Use a distinct session instance in tests

marksparkza commented 5 months ago

Currently, assertions about database state as well as factory object generation use the global session instance - the same used by the API. This could potentially mask some kinds of persistence errors, since test functions are not querying the DB independently of the code that they are testing.

We should created a standalone test session instance - perhaps via a fixture - to replace all references to odp.db.Session in test code.

marksparkza commented 5 months ago

The approach being taken is to use separate session instances for test setup (factories, etc) and test evaluation (assertions about expected DB state). These are in turn distinct from the global session used by ODP code in the course of running the tests. The factory and assertion session instances are initialized in the same way as the ODP session, using ODP DB config.

This resolves a kind of circular logic in database state verification where, for example, we might do a select to assert that some change to an object has been committed to the database, but in fact what we're seeing is the expected object that the test setup put into the session and modified locally. This problem is easily demonstrated in the test_db.py module, which essentially sanity-checks the ORM classes and corresponding test factories. Consider the following test function:

def test_create_archive():
    archive = ArchiveFactory()
    result = Session.execute(select(Archive)).scalar_one()
    assert (result.id, result.url) == (archive.id, archive.url)

If we set a breakpoint at the assert statement, we find that the expression archive is result evaluates to True! Both factory and test query are using the ODP Session instance. The assertion is comparing an object to itself, which trivially succeeds and is not a convincing assessment of the veracity of the DB write and read operations.

The revised factory classes instead use a FactorySession, while the query above uses a TestSession. Then, archive is result evaluates to False, the assertion succeeds, and we are convinced that the DB write and read were done independently and successfully.

marksparkza commented 5 months ago

The changes have exposed another logical flaw in some of the API tests.

DB state assertion functions (assert_db_state, etc) typically take a factory-generated batch (possibly modified) as a parameter and then query the DB to check that the result set is equivalent to the batch. In the vocabulary API tests for creating, updating and deleting vocabulary terms, we compare the terms collection on a factory-generated vocabulary object with rows in the vocabulary_term table. The problem is that those terms are only loaded (now via the factory session) when first referenced. If this happens after the API call being tested (i.e. during DB state verification), then we are loading the actual state instead of the expected state.

To illustrate the problem, let's consider the following test function, which verifies that a term cannot be added to a static vocabulary:

def test_create_term_static_vocab(api, vocabulary_batch):
    scopes = [ODPScope.VOCABULARY_PROJECT]
    client = api(scopes)

    vocab = vocabulary_batch[2]
    vocab.static = True
    FactorySession.add(vocab)
    FactorySession.commit()

    r = client.post(f'/vocabulary/{vocab.id}/term', json=dict(
        id=fake.word(),
        data={'title': 'Some Project'},
    ))

    assert_unprocessable(r, 'Static vocabulary cannot be modified')
    assert_db_state(vocabulary_batch)
    assert_db_term_state(vocab.id, *vocab.terms)
    assert_no_audit_log()

Now let's comment out the lines in the API route handler that prevent a static vocabulary from being extended:

    # if vocabulary.static:
    #     raise HTTPException(HTTP_422_UNPROCESSABLE_ENTITY, 'Static vocabulary cannot be modified')

We would now expect the assert_db_term_state call to fail, since the API is adding a term to the vocab, while the vocab.terms collection should contain only the terms that were generated with the vocabulary batch. [For the purpose of this demonstration, we can ignore the JSON response and audit log assertions.]

    # assert_unprocessable(r, 'Static vocabulary cannot be modified')
    assert_db_state(vocabulary_batch)
    assert_db_term_state(vocab.id, *vocab.terms)
    # assert_no_audit_log()

But what happens is that the vocab.terms collection is populated from the DB only when it is referenced, and thus it includes the term that was erroneously added by the 'broken' API. So the assertion function passes, and we have no idea that the API is broken.

To fix the test, we must reference vocab.terms before the API call, to ensure that it is populated only with the terms generated by the factory. Following is the corrected test function:

def test_create_term_static_vocab(api, vocabulary_batch):
    scopes = [ODPScope.VOCABULARY_PROJECT]
    client = api(scopes)

    vocab = vocabulary_batch[2]
    vocab.static = True
    FactorySession.add(vocab)
    FactorySession.commit()
    vocab.terms  # force load of existing (expected) terms

    r = client.post(f'/vocabulary/{vocab.id}/term', json=dict(
        id=fake.word(),
        data={'title': 'Some Project'},
    ))

    assert_unprocessable(r, 'Static vocabulary cannot be modified')
    assert_db_state(vocabulary_batch)
    assert_db_term_state(vocab.id, *vocab.terms)  # terms have already been loaded
    assert_no_audit_log()

Note that in this case, vocab.terms must be loaded after the factory session commit, because otherwise the commit would expire the vocab object and force a reload of the terms when referenced at the point of assertion.

SAEON / odp-server

Use a distinct session instance in tests #27