akhikolla / RcppDeepState

RcppDeepState, a simple way to fuzz test code in Rcpp packages
https://akhikolla.github.io./
8 stars 2 forks source link

generate missing values? (sometimes?) #37

Closed tdhock closed 3 years ago

tdhock commented 4 years ago

Hi @akhikolla I see from the log that your RcppDeepState_NumericVector random generation function makes quite a lot of nan, but do you ever make NA / missing values? If not, can you please do that? (sometimes) nan is different from NA. one of your logs has the following (with nan)

x values: -inf -4.30039e-277 nan -inf 3.91522e-233 inf inf -inf nan nan inf inf -inf -inf nan nan -inf -inf 4.04417e-245 -inf inf -inf nan nan nan -inf -inf nan nan inf -inf 1.5147e-111 -inf -inf inf inf nan -3.53791e-266 nan nan inf -3.56549e+204 nan -4.78404e+198 inf nan nan -6.90744e-23 -inf -5.38333e-174 nan nan inf -2.37436e+189 inf nan nan nan 3.09255e-187 nan -inf -inf -5.36873e+129 nan -inf inf -inf nan -4.80468e+291 -inf nan -inf -2.73599e+74 inf 5.22581e+45 nan nan -inf nan 0
y values: -inf nan nan -inf nan nan 1.54621e+305 nan -inf inf inf nan -inf -1.83297e+20 nan nan -3.12058e-208 inf nan nan inf -3.08838e-57 -9.50095e+291 1.91111e-282 -inf -inf inf inf nan inf inf inf nan inf -inf nan nan inf nan nan -inf nan nan -inf -3.49774e-124 inf nan nan nan inf nan nan 3.43469e+86 -inf nan -inf nan 1.09457e+26 inf -inf inf inf -1.07381e-204 1.21151e-120 -inf nan 3.7042e+112 -6.39816e+138 -2.69118e+237 nan nan nan -inf nan inf nan nan -2.92753e+127 inf nan -inf -inf inf -4.31656e-305 nan inf nan -1.22806e-155 0
z values: nan -inf -inf nan nan -1.25067e-210 2.40212e-95 6.74738e+155 inf nan inf nan inf inf -inf inf inf nan -2.9391e+141 -6.084e-207 -7.66514e-224 inf 3.76915e+120 1.23003e+98 -inf -4.48158e-05 -2.80807e+95 inf nan nan nan nan inf -3.80109e-230 nan -5.9527e-35 nan inf nan 4.73542e-126 3.71801e-152 -inf inf -inf inf nan -inf -3.80151e+231 -2.54146e+99 nan 6.20522e+35 -1.29686e-111 nan inf nan inf nan nan inf inf inf -inf nan 1.95989e+231 nan inf -inf nan nan 4.92815e+305 inf 0

In R you can have a missing value in an integer vector

> str(c(1L, NA))
 int [1:2] 1 NA

and you can have both NaN and NA in a numeric vector

> str(c(1L, NaN, NA))
 num [1:3] 1 NaN NA

NA represents missing values but NaN is different, it represents values that are not a number:

> 0/0
[1] NaN
akhikolla commented 4 years ago

For now, NumericVector chooses OneOf(DeepState_Double, R_NegInf,R_PosInf,R_NaN, NA_REAL)

NA_REAL is the same as NA but for Numeric Vector. NA_INTEGER is the same as NA but for Integer Vector

Will this work or should I change it back to NA?

tdhock commented 4 years ago

ok that is reasonable, but are you sure NA is being generated? why don't I see NA in the logs??

akhikolla commented 4 years ago

Yes, I am using the NA(NA_REAL) but the system is displaying it as nan. Values passed are:

 for(int i = 0 ; i < rand_size - 1 ;i++){      
    OneOf(
      [&] {
        rand_numvec[i] = DeepState_Double();  
      },
      [&] {
        rand_numvec[i] = NA_REAL;  
     });
  }
INFO: Starting fuzzing
WARNING: No seed provided; using 1599153021
WARNING: No test specified, defaulting to first test defined (deepstate_test_datatype)
mat values: nan nan 3.26547e+24 -5.0514e+216 3.7301e-204 -8.85217e-185 nan 1.79698e+108 7.15827e-13 3.79989e-180 -9.1979e-241 1.98351e-286 nan -1.25466e+125 nan -4.60431e-108 -3.54561e+270 nan nan 9.53519e+235 4.37272e+277 3.5377e-37 6.26452e+151 nan nan nan nan nan nan nan 5.17218e+238 4.86533e-188 -1.25845e+213 -4.96465e-33 -1.55221e+233 -1.99375e-56 0
mat values: 7.9463e+93 nan -7.71749e-119 8.04897e+292 5.22466e-120 1.42732e+193 -4.58536e+48 nan nan nan nan 44286.3 nan nan nan nan 1.00179e-240 -7.65463e-196 nan -0.0253467 -7.22982e-291 5.54029e-236 nan nan 2.31441e-239 nan nan -2.74043e-238 nan 2.73338e+209 -2.26352e+123 6.52772e+250 nan nan 2.16358e-108 2.11258e-234 nan -9.66297e+179 nan 3.46887e+286 nan nan nan 7.54884e+217 -2.27653e+145 1.55162e-156 nan -4.60031e+195 -1.07764e+279 -1.05276e+140 nan nan nan nan nan nan nan -3.31222e+155 -1.10529e-229 nan nan 1.03609e-212 nan nan 8.63723e-301 nan nan nan -4.18917e+254 nan nan -4.59289e+282 8.87044e+97 4.88607e-222 nan nan 2.50565e-121 nan nan 9.44119e+182 1.7692e+52 nan nan 0

But when I pass rand_numvec[i] = NA; instead of NA_REAL then I get the following error:

In file included from newtest.cpp:4:
/home/akhila/R/x86_64-pc-linux-gnu-library/3.6/RcppDeepState/include/RcppDeepState.h:38:26: error: 
      assigning to 'typename storage_type<14>::type' (aka 'double') from
      incompatible type 'Rcpp::Na_Proxy'
        rand_numvec[i] = NA;  
                         ^~
/home/akhila/R/x86_64-pc-linux-gnu-library/3.6/RcppDeepState/include/DeepState.hpp:335:33: note: 
      expanded from macro 'OneOf'
#define OneOf(...) NoSwarmOneOf(__VA_ARGS__)
tdhock commented 4 years ago

ok that is an issue with the Rcpp numeric print method then... can you post to rcpp-devel and ask why both NaN and NA are printed as nan?

akhikolla commented 4 years ago
On 4 September 2020 at 11:24, Akhila Chowdary Kolla wrote: Hello Everyone,
I am trying to use NA_REAL, NA_INTEGER, and R_NaN in my CPP code(doesn't
use R main). When I compile and run my code, NA_REAL it gives the value
as nan and NA_INTEGER gives the value as -2147483648, and R_NaN as 0.
I used the following code(from Rcpp FAQ): Rcpp::IntegerVector Missing_I() { Rcpp::IntegerVector v(1); v[0] = NA_INTEGER; // NA return v; } Rcpp::NumericVector Missing_N() { Rcpp::NumericVector v(4); v[0] = R_NegInf; // -Inf v[1] = NA_REAL; // NA v[2] = R_PosInf; // Inf v[3] = R_NaN; // nan return v; }
When I compile the functions using sourceCpp() I get the output as expected:
> sourceCpp("~/R/RcppDeepState/inst/extdata/filesave.cpp")
> Missing_I()
[1] NA
> Missing_N()
[1] -Inf NA Inf NaN
But when I compile the code using the TestHarness it gives me the following
output:
missing_n values: -inf nan inf 0
missing_i values: -2147483648
I saved the above functions(Missing_I, Missing_N) in a header file and made
a call to those functions from the testharness:
TEST(deepstate_test,datatype){
RInside();
Rcpp::IntegerVector missing_i = Missing_I();
std::cout <<"missing_i values: "<< missing_i << std::endl;
Rcpp::NumericVector missing_n = Missing_N();
std::cout <<"missing_n values: "<< missing_n << std::endl;
}
How can I get the results as expected? Any help is appreciated.

Am I understanding you correctly that you would like the R behaviour but in a non-R context (such as a catch or Google gtest harness) ? You can't as easily as R actively adds some extensions. I.e. IEEE 754 defines this for doubles, but "nobody" besides R does it for Int. So you would have to add your print function.

Dirk

tdhock commented 4 years ago

or

  1. use the R print function
  2. just dont print anything and instead use saveRDS (I think this would be best)
tdhock commented 4 years ago

hi any progress here?

akhikolla commented 4 years ago

Yes, using qread we are able to get the NA values in the input vectors.

> qread("~/extdata/packages/BNSL/inst/testfiles/mi/inputs/x.qs")
 [1]  4.633806e-217             NA   1.146679e-65 -3.332893e-113   5.656449e+96
 [6]  4.994804e-177 -1.217988e+178   8.305613e+08 -3.823290e-123  1.710376e+298
[11]             NA  -1.171657e-18 -2.926947e+146  -8.499259e+95  9.727741e+231
[16]  3.879002e-188 -1.472762e-113  1.196445e+301  -2.631010e+65 -9.230731e-287
[21]             NA             NA -5.075749e-118 -1.809222e+108  4.127988e-293
[26]  -2.498131e-37  4.280220e-114  3.774000e-143             NA   0.000000e+00
> qread("~/extdata/packages/BNSL/inst/testfiles/mi/inputs/y.qs")
 [1]  -6.183692e-97   1.132390e-80             NA -1.682128e+236 -5.262422e+159
 [6] -8.009751e-201  3.575827e-260  9.591488e+108  2.310050e-215  -8.455223e-82
[11]  2.378366e-145  3.639802e+244   1.094994e+17  9.138619e+105             NA
[16] -1.602776e+214 -2.122541e-150             NA -2.264595e+229  -1.928687e-31
[21]   3.807959e-44  3.011055e-244             NA   1.455465e+11 -1.919328e-247
[26]   2.199930e-10   2.424484e+79 -6.111270e-225  -6.370330e-97 -4.834867e-221
[31] -1.431325e-174  1.519535e+235   4.554491e-19  7.196765e+227   0.000000e+00

Should I resolve the issue?

tdhock commented 4 years ago

can you sometimes generate vectors with no missing values, and sometimes generate vectors with some missing values in random places? how did you implement that?

akhikolla commented 4 years ago

The NumericVector code looks like this. I choose OneOf(DeepState_Double()) for all the indexes first and then I choose some 5 random indexes (making it possible to get all the 5 missing values) in the given vector range and insert OneOf(DeepState_Double(),R_NaN,R_PosInf,R_NegInf,NA_REAL). So if OneOf chooses DeepState_Double() for all those 5 random indexes then we don't have any missing values in our final vector.

Rcpp::NumericVector RcppDeepState_NumericVector(){
  rand_size = DeepState_IntInRange(0,100);
  double missing_values[] = {DeepState_Double(),R_NaN,R_PosInf,R_NegInf,NA_REAL};
  Rcpp::NumericVector rand_numvec(rand_size);
  for(int i = 0 ; i < rand_size - 1 ;i++){      
    rand_numvec[i] = DeepState_Double();  
  }
  for(int i = 0 ; i < 5 ; i++){
    rand_numvec[DeepState_IntInRange(0,rand_size-1)] =OneOf(missing_values);
  }
  return rand_numvec;
}
tdhock commented 4 years ago

instead maybe OneOf(Vector with no missing values, Vector with some missing values) ?

akhikolla commented 3 years ago

I have updated the RcppDeepState_ random generation function to generate vectors that are having either no missing values at all or few missing values.