Right now, normality_test does not allow for any column selection and fails if any invalid column is present in the input DataFrame. The following modifications could improve the tool:
[x] Add parameter to subset columns
[x] Add Numpy array as valid input (and logic for that)
[x] Add check that used data is numeric
[x] Handle (drop?) missing data
[x] Remove "It is assumed that the input data is normally distributed" from the docstring as it is false/misleading
[x] Add check for maximum sample amount (5000) – not sure if this is necessary and if it's tested by Scipy
Right now,
normality_test
does not allow for any column selection and fails if any invalid column is present in the input DataFrame. The following modifications could improve the tool: