Open AlexJonesNLP opened 4 days ago
Hi @AlexJonesNLP, the hardcoded values are only the "sensible defaults". If you do not pass your own test condition, this default will apply and will be derived from reference.
You can set custom test conditions using parameters like gt
(greater than), lt
(less than), etc.
Check the docs here: https://docs.evidentlyai.com/user-guide/tests-and-reports/run-tests#id-3.-set-test-conditions
Here is a usage example using parameters eq
for equal, lte
for less than equal, etc.:
tests = TestSuite(
tests=[
TestShareOfMissingValues(lte=0.05),
TestNumberOfConstantColumns(eq=0),
TestNumberOfEmptyRows(eq=0),
TestNumberOfEmptyColumns(eq=0),
TestNumberOfDuplicatedColumns(eq=0),
generate_column_tests(
TestColumnDrift,
columns="all",
parameters={"stattest_threshold": 0.3, "stattest": "psi", "is_critical": False}
),
TestShareOfDriftedColumns(lte=0.3),
]
)
@elenasamuylova Ahh I see thank you, looking at these lines I get it now: https://github.com/evidentlyai/evidently/blob/15bfe24d674dc02e4f9bc7f99fc69bb8e7b1d507/src/evidently/tests/base_test.py#L407-L413
I'm still not quite sure why this implementation was chosen over using optional parameters though?
Hello, I was wondering about setting custom thresholds for data tests and I was a bit surprised that there didn't seem to be a straightforward way to do it for many of the tests. Some threshold values appear to be hardcoded, e.g. here: https://github.com/evidentlyai/evidently/blob/d3e21fb657118f82e5e388223347e57e313c800e/src/evidently/tests/data_integrity_tests.py#L126C75-L126C87
I'm aware that there's a way to make custom drift methods (https://docs.evidentlyai.com/user-guide/customization/options-for-statistical-tests), but it seems very bizarre to me that these thresholds are hardcoded in certain tests rather than simply being optional parameters. Am I missing something here? Why aren't these thresholds always passed as optional parameters?