DeepLcom / sql-mock

A Python library to test your SQL models using mocked input data
MIT License
33 stars 5 forks source link

Idea: not having to define input mocks for downstream tables when using assert_cte_equal #46

Open fgarzadeleon opened 7 months ago

fgarzadeleon commented 7 months ago

When I run assert_cte_equal I have to define all downstream tables even if they are not needed. I think this is just an issue in the checks not an actual runtime issue, so I propose a change in validate_all_input_mocks_for_query_provided so it also ignores downstream CTEs when running assert_cte_equal.

This would be a step forward in treating each CTE as a unit and being able to isolate every CTE and define the bare minimum to test it.

Somtom commented 7 months ago

Good point. One way around it is to use defaults in the table mock definition:

@table_meta(
    query_path="./examples/test_query.sql",
    default_inputs=[
        UserTable([]),
        SubscriptionTable([]),
    ],  # We can provide defaults for the class if needed. Then we don't always need to provide data for all input tables.
)
class MultipleSubscriptionUsersTable(ClickHouseTableMock):
    user_id = col.Int(default=1)

As in the example above, they can also be empty by default. Maybe this can solve your problem in the meanwhile.

I think the issue is that we currently call the validation when you use the from_mocks method. I see pros and cons about that:

Pros

Cons

fgarzadeleon commented 7 months ago

Thanks Thomas.

Having thought about it there are some possible solutions:

Proposal 1: When using the from_mocks() method we can add a cte= input so that we declare from the start that we will only work until that cte. This would need a modification of the validation.

Then we don't have to define anything else and we have declared upfront that we will only be doing validations until the declared cte.

input_table_mock = UserTable.from_dicts(user_table_data)

res = MultipleSubscriptionUsersTable.from_mocks(input_data=[input_table_mock], cte='cte_users')

This does leave the question what to does assert_equal and assert_cte_equal. I think there is also opportunity of condensing both these functions into one assert_equal() with also the options cte= value to input.

Proposal 2: Another option is embracing my previous suggestion https://github.com/DeepLcom/sql-mock/issues/45 and not having an assert function at all and just do generate() and if you want to do a cte then generate(cte=).

You would need to be more flexible in the validation when doing from_mocks(). I think you would build some type of dictionary to say which cte can be generated and which not.

You can provide an error at the generate(cte=) call where the error thrown out can be clear that the error was hit at the generate stage.

Checks happen once but error thrown out is in the context of what you are trying to run.

Proposal 3: We explicitely say that if you want to isolate and test a cte with minimal inputs you need to define it in the models.py. If we only want to test the cte cte_users.

Then I define it something like this in the models.py:

@table_meta(query=QUERY, cte=`cte_users`)
class CTEUsersMock(ClickHouseTableMock):
        user_id = col.Int(default=1)

I like this one in the send that it forces you to be more explicit with your cte schema testing definition but means you will have to write more models in models.py.

Let me know what you think.