Currently, indexing operations access Elasticsearch but never the SQL database. However, if we add database-only corpora, that obviously needs to change.
This causes issues for unit tests, however. The way pytest-django works, you can't access the SQL database in a session-scoped test fixture, but our unit tests rely on sesssion-scoped fixtures to create elasticsearch indices for test corpora.
One reason why indexing fixtures are session-scoped is that otherwise, the test time really builds up. (On my machine, I think it goes from 2 minutes to 4-5 minutes.) This is partly because the operations take time, and partly because ES is near-real-time, so you need a "buffer period" after populating an index.
We should find a solution for indexing fixtures that allows them to access the database.
Proposed solution:
Configure the test project / test corpora so every index name has the prefix test- (probably a good idea anyway)
Creating and populating the index is done in a function-level fixture that creates the index, but only if it doesn't exist yet. This fixture does not delete the index in its teardown.
Add a session-level fixture that does nothing in the setup, but deletes all test-* indices in its teardown.
Alternatives:
It looks like Django's own testing framework might have more options for setting up the (SQL) database on a session level? In any case, that switch would not be worth it for this issue.
I think the reason why pytest-django only allows database access in function scope, is to avoid weird issues like partially overlapping requirements for tests (e.g. {A}, {A,B}, {B}). But in this case, the data used in the fixture does not need to transfer to the test. A fixture could have its own isolated database to store and access a corpus during its setup.
Currently, indexing operations access Elasticsearch but never the SQL database. However, if we add database-only corpora, that obviously needs to change.
This causes issues for unit tests, however. The way pytest-django works, you can't access the SQL database in a session-scoped test fixture, but our unit tests rely on sesssion-scoped fixtures to create elasticsearch indices for test corpora.
One reason why indexing fixtures are session-scoped is that otherwise, the test time really builds up. (On my machine, I think it goes from 2 minutes to 4-5 minutes.) This is partly because the operations take time, and partly because ES is near-real-time, so you need a "buffer period" after populating an index.
We should find a solution for indexing fixtures that allows them to access the database.
Proposed solution:
test-
(probably a good idea anyway)test-*
indices in its teardown.Alternatives: