data-cleaning / errorlocate

Find and replace erroneous fields in data using validation rules
http://data-cleaning.github.io/errorlocate/
22 stars 3 forks source link

crash in replace_errors #8

Closed markvanderloo closed 8 years ago

markvanderloo commented 8 years ago
> v <- validator(turnover + other.rev==total.rev
+                , turnover > 0
+                , other.rev>0
+                , total.rev>0)
> 
> d1 <- replace_errors(retailers, v)
Error in envRefInferField(x, what, getClass(class(x)), selfEnv) : 
  ‘adapt’ is not a valid field or method name for reference class “errorlocation”
markvanderloo commented 8 years ago

Here's a simpler example reproducing this (I think).

> v <- validator(turnover + other.rev==total.rev
+                , turnover > 0
+                , other.rev>0)
> 
> el <- locate_errors(retailers[1:2,4:6],v)
> head(el$._values)
      <NA> <NA> total.rev
[1,]    NA   NA     FALSE
[2,] FALSE   NA        NA
markvanderloo commented 8 years ago

Ok, I think I traced it down to this: when values are missing in the original data, the error locations (and somehow the column names) turn up missing.

> v <- validator(turnover + other.rev==total.rev)
> 
> # 3rd row has no missings:
> el <- locate_errors(retailers[3,4:6],v)
> head(el$._values)
     turnover other.rev total.rev
[1,]    FALSE     FALSE      TRUE
> 
> # 1st row has missings
> el <- locate_errors(retailers[1,4:6],v)
> head(el$._values)
     <NA> <NA> <NA>
[1,]   NA   NA   NA
edwindj commented 8 years ago

Nice findings! Ok, will look into it tomorrow morning!

edwindj commented 8 years ago

Working on it: example works for numerical data. After lunch improve it for categorical also.

edwindj commented 8 years ago

Crash is fixed, also put it into drat. Will enter a new issue with NA values for categorical values.