Closed jonashaag closed 1 year ago
It's great to know that the -k
argument works with the supplied id
of a parametrization.
I'm in favor of adding some extra fields to Requirement
and Constraint
classes, like a name and a description.
I think we may need this for two reasons:
-k
An easy solution would be to provide a metadata
attribute to the Requirement
object that allows the user to pass some custom identfier / explaning argument.
Example:
req = BetweenRequirement.from_tables(...)
req.add_description("test_identifier")
req.add_uniques_equality_constraint(columns1, columns2)
Another use case might be to group tests together like this
req = BetweenRequirement.from_tables(...)
req.add_description("Customer Validation") # what happens to whitespaces?
req.add_uniques_equality_constraint(columns1, columns2)
req2 = WithinRequirement.from_table(...)
req2.add_description("Customer Validation")
req2.add_uniqueness_constraint(columns)
and then pytest -k "'Customer Validation'"
The latter could maybe also be achieved by adding a mark
attribute to the Requirement
or Constraint
and then adding that mark programmatically to the pytest constraints by using a pytest hook (e.g. pytest_collection_modifyitems) (Haven't tried this yet).
Then, the user can just go
# still don't know about case or whitespace
pytest -m customer_validation
This might be just constraint-specific stuff. Example: For the Uniqueness Constraint, I would love to see in the output which columns were tested.
better readability, especially with a large test/requirement suite
I'm not exactly sure I understand what you mean. Especially with a very large test suite - with say >1000 tests - how do custom names, help with readability?
The latter could maybe also be achieved by adding a mark attribute
Exactly! This is done at QuantCo on several projects. This is also elaborated on in this article[0] of our documentation.
This might be just constraint-specific stuff. Example: For the Uniqueness Constraint, I would love to see in the output which columns were tested.
I would think that this kind of information could be conveyed through the assertion text. In the particular example you give I'm surprised that the columns aren't already part of the assertion text, see this piece of code[1]. If you have found a counterexample/bug, please feel free to open a separate issue!
[0] https://datajudge.readthedocs.io/en/latest/testing.html [1] https://github.com/Quantco/datajudge/blob/main/src/datajudge/constraints/miscs.py#L92-L95
Especially with a very large test suite - with say >1000 tests - how do custom names, help with readability?
Custom names might help because I might be using the same constraint
for different cases.
An example: I have 5 dumps with ~30 tables each and there are 10 distinct tests for each table pair from old -> new dump (for the BetweenRequirements)
I might be using Uniqueness
to validate something and then also validate primary keys for all of my tables as well while I'm there. This will cause my list to have 300 (= 5 30 2) test entries that all start with Uniqueness::{dump_name}...
.
If I could have something like a metadata field that is printed along, half of those might be called UniqueColumn
and the other half maybe PrimaryKeyValidation
etc.
I think the main points here are:
...0
etc.)already part of the assertion text
Good point! I think we rather focused on having such information in the test name, but that might be an edge case for datajudge.
Currently it's not obvious how to select a single constraint when using
pytest -k
. The pytest test names are taken fromget_description()
which may be something like...table1|table2
. Problems with that:pytest -k
doesn't allow to specific things liketable1|table2
because of the|
Idea: Provide a way to add your own name to each of the requirements/constraints?