data-cleaning / validate

Professional data validation for the R environment
403 stars 38 forks source link

or not working properly #175

Closed adewhite81 closed 1 year ago

adewhite81 commented 1 year ago

Hello all,

I have been studying for a short time the package. I need to validate rules with logical operators. In my simple script, I've tested AND and OR operators. The first works correctly while the second seems to have some problems, or maybe I'm doing something wrong.

Why is the property error of summary true using the OR operator? Please see the example below.

Here is my snipped of code:

df2 <- data.frame(A=c(NA,2,NA,4,5), B=c(10,NA,30,40,50), C=c(NA,200,300,4,500))

rules <- validator( 
is.na(A) & is.na(C), #error false
is.na(B)| is.na(C),#error false
  is.na(A) | is.na(C),#error false
  (**A>=B | c<B ),#error true
  (A>=B || c<B ),#error true
  !(!(A>=B) &&  !(c<=B) ) #error true
  , !(!(A>=B) & !(c<=B) ))#error true**

out <- confront(df2, rules)
ve<-summary(out)
print(out)

Screenshot 2023-09-06 alle 15 14 50

markvanderloo commented 1 year ago

Hi there,

you can use

errors(out)

to see the error messages. Looking at your code, I see the use of several column names in your rules that do not exist. Specifically: in V4 the variable **A does not exist in your dataset. In V5 to V7 the variable c does not exist in your dataset (remember that R is case-sensitive, so c and C are different variable names).

HTH, Mark

adewhite81 commented 1 year ago

hi Mark, thanks, I corrected the typos but the validation with the expression containing the or keeps giving me error I need to correctly validate logical expression with OR operator, what am I doing wrong??

this is the output of the last validations;

4 V4 5 1 1 3 FALSE FALSE (A - B >= -1e-08 | C < B) 5 V5 0 0 0 0 TRUE FALSE (A - B >= -1e-08 || C < B) 6 V6 0 0 0 0 TRUE FALSE !(A - B < -1e-08 && C - B > 1e-08) 7 V7 5 1 1 3 FALSE FALSE A - B >= -1e-08 | C - B <= 1e-08 Why are the expressions different from the input expressions? thanks

library("validate") df2 <- data.frame(A=c(NA,2,NA,4,5), B=c(10,NA,30,40,50), C=c(NA,200,300,4,500))

rules <- validator( is.na(A) & is.na(C), #error false is.na(B)| is.na(C),#error false is.na(A) | is.na(C),#error false (A>=B | C<B ),#error true (A>=B || C<B ),#error true !(!(A>=B) && !(C<=B) ) #error true , !(!(A>=B) & !(C<=B) ))#error true**

out <- confront(df2, rules) ve<-summary(out) print(out)

Il giorno mar 12 set 2023 alle ore 09:36 Mark van der Loo < @.***> ha scritto:

Hi there,

you can use

errors(out)

to see the error messages. Looking at your code, I see the use of several column names in your rules that do not exist. Specifically: in V4 the variable **A does not exist in your dataset. In V5 to V7 the variable c does not exist in your dataset (remember that R is case-sensitive, so c and C are different variable names).

HTH, Mark

— Reply to this email directly, view it on GitHub https://github.com/data-cleaning/validate/issues/175#issuecomment-1715164064, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCMOUGSGQJFTXJU7KNFSIRTX2AGJNANCNFSM6AAAAAA4NKRVF4 . You are receiving this because you authored the thread.Message ID: @.***>

markvanderloo commented 1 year ago

you are using invalid syntax. Avoid && and ||, and use & and | in stead. Also, there is some unnecessary bracketing there. The following seems to work.

> rules <- validator(
    is.na(A) & is.na(C), 
    is.na(B)| is.na(C),
    is.na(A) | is.na(C),
    A>=B | C<B ,
    A>=B | C<B ,
    !(!(A>=B) &  !(C<=B) ) 
   , !(!(A>=B) & !(C<=B) )
 )
> df2 <- data.frame(A=c(NA,2,NA,4,5), B=c(10,NA,30,40,50), C=c(NA,200,300,4,500))
> out <- confront(df2, rules)
> out
Object of class 'validation'
Call:
    confront(dat = df2, x = rules)

Rules confronted: 7
   With fails   : 7
   With missings: 4
   Threw warning: 0
   Threw error  : 0