FabrizioSandri / RcppDeepState

RcppDeepState, a simple way to fuzz test code in Rcpp packages
https://fabriziosandri.github.io/gsoc-2022-blog/
6 stars 5 forks source link

Fuzzing functions with Rcpp parameters #4

Closed FabrizioSandri closed 2 years ago

FabrizioSandri commented 2 years ago

@tdhock I found that fuzz testing C++ functions that need a Rcpp datatype as a parameter is not possible. In general using Rcpp constructs directly from another C++ program is not allowed. This is exactly what I discussed with a contributor of the Rcpp package about this topic, and his response is as follows:

"All" that Rcpp does is to provide R with callable code via the .Call() interface which is meant to extend a running R session. Nowhere in the R (or Rcpp) documentation is it hinted that you can run code separately. Which is why we run all tests etc from R.

Rcpp datatypes can only be used in a R environment. They cannot be used outside of R. Executing a function using Rcpp-based types from a C++ environment will almost likely result in a segmentation fault as demonstrated in the issue #1221.

Proof of concept

Let's use a simple C++ function called bootstrap that initalizes a Rcpp::NumericVector and returns it's first element: 10.

#include <iostream>
#include <Rcpp.h>

// [[Rcpp::export]]
int bootstrap() {
    // Allocate a sample NumericVector
    Rcpp::NumericVector sample {10,20,30,40,50};

    return sample[0];
}

int main(int argc, char* argv[]){
    bootstrap();
    return 0;
}

If you manage to compile this function including all the headers, no error is reported. It seems everything is working well.

g++ -lR -I"/usr/include/R/" -I/usr/local/include  -I"/home/fabri/R/x86_64-pc-linux-gnu-library/4.2/Rcpp/include" -L/usr/lib64/R/lib -o bootstrap bootstrap.cpp

However if you try to run the compiled function, a segmentation fault occurs:

$ ./bootstrap
[1]    23848 segmentation fault (core dumped)  ./bootstrap

Possible solutions

I considered how to address this problem in the most effective manner and finally came up with some potential answers.

First solution

The first solution is to create a mock Rcpp header file containing some Rcpp available data types, e.g. NumericVector, IntegerVector, etc associated with their standard C++ library data type. In this way, when creating the harness instead of including the original Rcpp header file ,we include the mock header file. The result is that the Deepstate fuzz test is performed on the STL C++ library, instead of using the Rcpp data type.

For example we can do this by defining the following typedefs in a mock Rcpp.h file

#ifndef __RCPP_H__
#define __RCPP_H__
#include <vector>

namespace Rcpp{
    typedef std::vector<bool> LogicalVector;
    typedef std::vector<int> IntegerVector;
    typedef std::vector<double> NumericVector;
}

#endif

So when the compiler finds an IntegerVector definition, it converts it into a std::vector<int>.

This is accomplished by replacing all the #include <Rcpp.h> of the original library with #include "Rcpp.h". In this way we ensure that the parameters are compliant with the STL library. Referring to the example mentioned above the result is that all of the compilation procedures are working, and no segmentation fault is thrown.

Second solution

Avoid to fuzz test functions that contains Rcpp parameters. This seem to be the easiest solution, however the result is the lack of the support for packages that includes some Rcpp custom parameter as input. This will allow only fuzz testing of standard c++ functions not involving Rcpp parameters. However we can see from the Table 1 of the analysis performed by Akhila that a huge number of packages use these type of parameters. Link to the paper.

FabrizioSandri commented 2 years ago

Finally, I realized what I had been missing. I discovered how RcppDeepState handles this problem after digging through the RcppDeepState package. RInside should be included in the compiled application as a solution.

As a result, the following changes will be made to the program:

#include <iostream>
#include <Rcpp.h>
#include <RInside.h>

// [[Rcpp::export]]
int bootstrap() {
    RInside R;
    // Allocate a sample NumericVector
    Rcpp::NumericVector sample {10,20,30,40,50};

    return sample[0];
}

int main(int argc, char* argv[]){
    bootstrap();
    return 0;
}

The compilation procedure must now include RInside.

g++ -g -lR -lRInside -I"/usr/include/R/" -I/usr/local/include -I/usr/lib/R/library/RInside/include -I"/home/fabri/R/x86_64-pc-linux-gnu-library/4.2/Rcpp/include" -L/usr/lib/R/library/RInside/lib -Wl,-rpath=/usr/lib/R/library/RInside/lib -L/usr/lib64/R/lib -o bootstrap bootstrap.cpp
tdhock commented 2 years ago

yes great