NickCrews / mismo

The SQL/Ibis powered sklearn of record linkage
https://nickcrews.github.io/mismo/
GNU Lesser General Public License v3.0
12 stars 3 forks source link

Testing: Refactor some of the tests to facilitate test-driven development and modularity #17

Closed OlivierBinette closed 6 months ago

OlivierBinette commented 6 months ago

The tests currently use ibis.memtable to construct tables used as an input. This relies on the backend provided by ibis.options.default_backend.

I think it would be good to have more control over the backend for tests. We don't want individual test files to worry about this, so we could have a conftest.py file that provides ibis connection objects and table constructors.

For instance, in conftest.py, we could have the following:

@fixture  # This can optionally be parameterized to test against different backends. We can also choose how to scope the connection.
def ibis_connection():
    ...

@fixture
def table_factory(ibis_connection):
    def func(data):
        return ibis_connection.memtable(data)
    return func

@fixture
def column_factory(table_factory):
    def func(column_data):
        table = table_factory({"column": column_data})
        return table.column
    return func

Individual test files cane then use the table_factory and column_factory fixtures to define the test data they need. And we have all the control we want over the backend, Ibis connection, and the scoping of these objects, without any of the individual test files having to worry about it.

NickCrews commented 6 months ago

This looks great and I would totally accept a PR, or I can do it myself.

Even if we don't test against multiple backends, then column_factory and table_factory functions would be great to remove boilerplate.

NickCrews commented 6 months ago

@OlivierBinette if you want, take a look at that commit and let me know if I missed something there.