engarde-dev / engarde

A library for defensive data analysis.
MIT License
501 stars 40 forks source link

within_range generates a long output, possibly it could be much shorter #25

Open ianozsvald opened 9 years ago

ianozsvald commented 9 years ago

within_range reports the truth values of the range test for each row in the column it tests. For a longer dataframe (e.g. 891 titanic rows) where you have 1 violating row you get a long list of False that hides the True row. Possibly the report could just summarise the rows that are in violation of the constraint?

Current:

import engarde.checks as ck
df = pd.DataFrame(np.random.randn(4, 2))
ck.within_range(df, {0:(0, 10)})
AssertionError: ('Outside range', 0    False
1    False
2    False
3     True
Name: 0, dtype: bool)

Suggested:

AssertionError: ('Outside range', 
3     True
Name: 0, dtype: bool)

and possibly the .sum() of the result column could be included to report the number of violations, in case that number is very large?

ianozsvald commented 9 years ago

(sidenote - I'm not using this in my video series, I'm noting this just as a possible-future-tweak)