data-cleaning / errorlocate

Find and replace erroneous fields in data using validation rules
http://data-cleaning.github.io/errorlocate/
21 stars 3 forks source link

Missing data columns where not handled correctly (<= 0.3) #22

Closed edwindj closed 4 years ago

edwindj commented 4 years ago

If we supply a dataset missing a column/variable which is mentioned in the rules set. It just ignored the variable or gave strange errors.

library(errorlocate)
rules <- validator(x > 0, y > 0)
# loading data without y!
data <- data.frame(x = 1)
le <- locate_errors(data, rules)
le$errors
##          x
## [1,] FALSE
rules <- validator(x > y, y > 0, z > 0)
data <- data.frame(x = 0, y = 1)
le <- locate_errors(data, rules)
le$errors
##         x  y
## [1,] TRUE NA
edwindj commented 4 years ago

Fixed in version 0.3.2, Automatically a column is added to the dataset with NA and a warning is generated.

library(errorlocate)
rules <- validator(x > 0, y > 0)
data <- data.frame(x = 1)
le <- locate_errors(data, rules)
## Warning: Adding missing columns 'y'=NA to data.frame.
le$errors
##          x  y
## [1,] FALSE NA
rules <- validator(x > y, y > 0, z > 0)
data <- data.frame(x = 0, y = 1)
le <- locate_errors(data, rules)
## Warning: Adding missing columns 'z'=NA to data.frame.
le$errors
##         x     y  z
## [1,] TRUE FALSE NA
edwindj commented 4 years ago

Choose a different solution: missing a data column now gives an error. (>=0.3.3)