I've just set up the library and noticed this thing:
Here is the data example:
The tests:
And the sample of the results:
As you can see, the first constraint_message says that 60% of data didn't meet the requirement, although 60% of it did meet. In the second row, it says that 0% didn't meet which means that 100% is passed successfully, thought it's the opposite: none of the values among ga_visits column is unique.
Description of changes:
I propose to change the formula of calculating ratio in constraint_message, so it becomes the ratio of mismatched values.
If we use val ratio = mismatchCount.toDouble / primaryCount, then the results for my case would be 4/10=0.4 and 10/10=1 "didn't meet the constraint requirement".
Another approach is to omit not in the message, however, I'm not sure if it follows the logic.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Hello!
I've just set up the library and noticed this thing:
Here is the data example:
The tests:
And the sample of the results:
As you can see, the first constraint_message says that 60% of data didn't meet the requirement, although 60% of it did meet. In the second row, it says that 0% didn't meet which means that 100% is passed successfully, thought it's the opposite: none of the values among ga_visits column is unique.
Description of changes: I propose to change the formula of calculating ratio in constraint_message, so it becomes the ratio of mismatched values. If we use val ratio = mismatchCount.toDouble / primaryCount, then the results for my case would be 4/10=0.4 and 10/10=1 "didn't meet the constraint requirement".
Another approach is to omit not in the message, however, I'm not sure if it follows the logic.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.