RGLab / cytolib

c++ library for representing and interacting the gated cytometry data structure
GNU Affero General Public License v3.0
12 stars 11 forks source link

Determine whether to keep MemCytoFrame class #28

Closed mikejiang closed 4 years ago

mikejiang commented 4 years ago

While @jacobpwagner is investigating H5FD_CORE (in memory H5 for faster IO, which potentially can replace the existing MemCytoFrame to simplify/unify the code base for CytoFrame), he discovered there is no significant performance advantage by keeping data in memory based on the benchmarking at R level

tmp <- tempfile()
fr <- matrix(rnorm(16*3e6), ncol = 16, dimnames = list(NULL, letters[1:16]))
fr <- flowFrame(fr)
write.FCS(fr, tmp)
utils:::format.object_size(file.size(tmp), "auto")
[1] "183.1 Mb"
cf1 <- load_cytoframe_from_fcs(tmp, is_h5 = F)
cf2 <- load_cytoframe_from_fcs(tmp, is_h5 = T)
> microbenchmark( a<- exprs(cf1[,1]), b <- exprs(cf2[,1]), times = 4)
Unit: milliseconds
                 expr      min       lq     mean   median       uq      max neval
 a <- exprs(cf1[, 1]) 15.40628 16.32062 20.41917 20.63623 24.51772 24.99796     4
 b <- exprs(cf2[, 1]) 12.84491 15.25784 16.78892 17.90347 18.32001 18.50385     4
> microbenchmark( a<- exprs(cf1), b <- exprs(cf2), times = 4)
Unit: milliseconds
            expr      min       lq     mean   median       uq      max neval
 a <- exprs(cf1) 361.6466 362.8688 415.8592 383.3787 468.8496 535.0328     4
 b <- exprs(cf2) 386.0428 388.0514 472.7467 472.1052 557.4420 560.7336     4

It shows very minor speedup for in-memory version of cytoframe when reading the whole dataset. For the partial IO, on-disk version is even better.

We will have to further profile code to see where exactly slows down in-memory cytoframe, otherwise we have no reason to keep it.

mikejiang commented 4 years ago

Apparently the overhead is not from R code

$by.total
             total.time total.pct self.time self.pct
".Call"            0.36       100      0.36      100
"cf_getData"       0.36       100      0.00        0
"exprs"            0.36       100      0.00        0

Needs to trace to Rcpp level

mikejiang commented 4 years ago

Here is some profiling at Rcpp level

NumericVector test_getData(Rcpp::XPtr<CytoFrameView> fr){
  // start the timer
  Timer timer;
  timer.step("start");        // record the starting point
  int ncol = fr->n_cols();

  EVENT_DATA_VEC dat = fr->get_data();
  timer.step("fr->get_data");      // record the  step

  NumericMatrix mat = wrap(dat);
  timer.step("wrap(dat)");      // record the  step

  //other stuff ...
  NumericVector res(timer);   // 
    return res;
}
> cf1 <- load_cytoframe_from_fcs(tmp, is_h5 = F)
> cf2 <- load_cytoframe_from_fcs(tmp, is_h5 = T)
> diff(a)/1e6
fr->get_data    wrap(dat) 
    200.6412     156.7754 
> 
> a <- test_getData(cf2@pointer)
> diff(a)/1e6
fr->get_data    wrap(dat) 
    236.7097     155.7806 

Although it doesn't answer why get_data call still differs so little between in-memory and on-disk version of cytoframe, it does reveal the huge overhead (about 40%) in converting arma::Mat to Rcpp::NumericMatrix. There has to be some way to avoid data copying during this process.

mikejiang commented 4 years ago

There is currently no known methods to transfer the data ownership(i.e. underlying array pointer) between arma::Mat and Rcpp::NumericMatrix. The only way to avoid data copying during the data transferring from C to R is to change API CytoFrame::get_data() to return an array pointer and let the caller to take the ownership of the data (and free the memory).

mikejiang commented 4 years ago

Actually Rcpp:Timer is in nanoseconds by default, which means we were really looking at 0.2s overhead from MemCytoFrame::get_data() calls, and about 0.03s difference between mem and h5. So maybe this difference is negligible so that we can conclude that the separate logic for MemCytoFrame is unnecessary?

mikejiang commented 4 years ago

So basically fetching data from MemCytoFrame to R goes through two-copy process:

  1. copy data from MemCytoFrame through fr->get_data() call (say receive it as arma::Mat dat1)
  2. copy dat1 from armadillo to R(i.e. NumericMatrix dat2) Even though, the two copies are redundant for this case, we can't avoid step2 since armadillo doesn't allow you to steal the underlying data. Potentially we can avoid step1 by grabbing the reference (i.e fr->get_data_ref()), but it must assume the underlying cytoframe is MemCytoFrame since it won't be valid for H5CytoFrame.

Also it is worth to mention that H5CytoFrame also involves two-copy process

  1. from disk to armadillo
  2. from armadillo to R And step1 doesn't require the extra copying in memory when the mat is passed around because the mat is transient and thus considered as Rvalue reference and triggers c++11 's move semantics.

To conclude, MemCytoFrame may not benefit the generic IOs at R level, but it would be useful for c++ application when it is carefully implemented by using the get_data_ref(). Here is a quick comparison

        auto fr1 = MemCytoFrame("../flowWorkspace/wsTestSuite/profile_get_data.fcs", config);
    fr1.read_fcs();
    double start = gettime();
    auto dat = fr1.get_data();
    double runtime = (gettime() - start);
    cout << "get a copy: " << runtime << endl;

    start = gettime();
    auto &dat_ref = fr1.get_data_ref();
    runtime = (gettime() - start);
    cout << "get a reference: " << runtime << endl;

results

get a copy: 188.352
get a reference: 0.005
mikejiang commented 4 years ago

Checking if std::vector is more efficient than arma:Mat for deep-copying

Timer timer;
  //prepare data
  int ncol = fr->n_cols();
  int nrow = fr->n_rows();
  int ntotal = nrow*ncol;
  arma::Mat<double> mat = fr->get_data();
  double * ptr = mat.memptr();
  vector<double> vec(ptr, ptr + ntotal);
  //copy arma::Mat
  timer.step("start");       
  arma::Mat<double> d1 = mat;
  timer.step("copy(arma)");    
//copy vector
  vector<double> d2 = vec;
  timer.step("copy(vec)");     
  //arma::Mat to NumericMatrix
    NumericMatrix res1 = wrap(mat);
  timer.step("arma->Rcpp");     
    //vector to NumericMatrix
  NumericMatrix res2(nrow, ncol, vec.begin());
  timer.step("vec-->Rcpp");      
> a <- test_getData(cf1@pointer)
> diff(a)/1e6
copy(arma)  copy(vec) arma->Rcpp vec-->Rcpp 
  188.7736   186.6087   198.9434   206.8545

The answer is no

mikejiang commented 4 years ago

We were thinking about allocate mem in R and pass it back to cytolib to fill the content so that the copying from cytolib to R can be avoided. i.e

  timer.step("start");        // record the starting point
  //pre-allocate mem in R (can't use NumericMatrix(n,m) constructor since it is slow)
  SEXP dbls = PROTECT(Rf_allocVector(REALSXP, ntotal));
  //create arma::Mat to point to the same mem
  double * ptr = REAL(dbls);
  arma::Mat<double> mat(ptr, nrow, ncol, false);
  timer.step("Rf_allocVector");      // record the first step

  //fill the mat from cytoframe
  fr->get_data(mat);
  timer.step("fr->get_data(mat)"); 
  UNPROTECT(1);

  //convert sexp to rcpp numericmatrix
  NumericVector mat1(dbls);
  mat1.attr("dims") = Dimension(nrow, ncol);
> a <- test_getData(cf1@pointer)
> diff(a)/1e6
   Rf_allocVector fr->get_data(mat) 
         0.022846        187.282139 

And it does eliminate the overhead from Rcpp::wrap(mat) call.

However, this only works for unindexed cytoframeView where the matrix size is consistent between the view and underlying cytoframe object. Otherwise, especially for the row-indexed view, because h5 lib is slow on hyberslab selection, H5CytoFrame needs to always read in all rows and then does row-selection afterwards. Therefore this workflow won't work because R only sees the view and can't allocate the right-size of memory for h5 reading.

mikejiang commented 4 years ago

Looking at the source of Rf_allocVector https://github.com/wch/r-source/blob/trunk/src/main/memory.c#L2606 Probably it is not a good idea to force the memory pointer into R without going through its own allocator.