Closed mikejiang closed 4 years ago
Apparently the overhead is not from R
code
$by.total
total.time total.pct self.time self.pct
".Call" 0.36 100 0.36 100
"cf_getData" 0.36 100 0.00 0
"exprs" 0.36 100 0.00 0
Needs to trace to Rcpp level
Here is some profiling at Rcpp level
NumericVector test_getData(Rcpp::XPtr<CytoFrameView> fr){
// start the timer
Timer timer;
timer.step("start"); // record the starting point
int ncol = fr->n_cols();
EVENT_DATA_VEC dat = fr->get_data();
timer.step("fr->get_data"); // record the step
NumericMatrix mat = wrap(dat);
timer.step("wrap(dat)"); // record the step
//other stuff ...
NumericVector res(timer); //
return res;
}
> cf1 <- load_cytoframe_from_fcs(tmp, is_h5 = F)
> cf2 <- load_cytoframe_from_fcs(tmp, is_h5 = T)
> diff(a)/1e6
fr->get_data wrap(dat)
200.6412 156.7754
>
> a <- test_getData(cf2@pointer)
> diff(a)/1e6
fr->get_data wrap(dat)
236.7097 155.7806
Although it doesn't answer why get_data
call still differs so little between in-memory
and on-disk
version of cytoframe
, it does reveal the huge overhead (about 40%) in converting arma::Mat
to Rcpp::NumericMatrix
. There has to be some way to avoid data copying during this process.
There is currently no known methods to transfer the data ownership(i.e. underlying array pointer) between arma::Mat
and Rcpp::NumericMatrix
. The only way to avoid data copying during the data transferring from C
to R
is to change API CytoFrame::get_data()
to return an array pointer and let the caller to take the ownership of the data (and free the memory).
Actually Rcpp:Timer
is in nanoseconds by default, which means we were really looking at 0.2s
overhead from MemCytoFrame::get_data()
calls, and about 0.03s
difference between mem
and h5
. So maybe this difference is negligible so that we can conclude that the separate logic for MemCytoFrame
is unnecessary?
So basically fetching data from MemCytoFrame
to R
goes through two-copy process:
MemCytoFrame
through fr->get_data()
call (say receive it as arma::Mat dat1
)dat1
from armadillo
to R
(i.e. NumericMatrix dat2
)
Even though, the two copies are redundant for this case, we can't avoid step2 since armadillo
doesn't allow you to steal the underlying data.
Potentially we can avoid step1 by grabbing the reference (i.e fr->get_data_ref()
), but it must assume the underlying cytoframe
is MemCytoFrame
since it won't be valid for H5CytoFrame
. Also it is worth to mention that H5CytoFrame
also involves two-copy process
armadillo
armadillo
to R
And step1 doesn't require the extra copying in memory when the mat
is passed around because the mat
is transient and thus considered as Rvalue reference
and triggers c++11
's move
semantics.To conclude, MemCytoFrame
may not benefit the generic IOs at R
level, but it would be useful for c++
application when it is carefully implemented by using the get_data_ref()
. Here is a quick comparison
auto fr1 = MemCytoFrame("../flowWorkspace/wsTestSuite/profile_get_data.fcs", config);
fr1.read_fcs();
double start = gettime();
auto dat = fr1.get_data();
double runtime = (gettime() - start);
cout << "get a copy: " << runtime << endl;
start = gettime();
auto &dat_ref = fr1.get_data_ref();
runtime = (gettime() - start);
cout << "get a reference: " << runtime << endl;
results
get a copy: 188.352
get a reference: 0.005
Checking if std::vector
is more efficient than arma:Mat
for deep-copying
Timer timer;
//prepare data
int ncol = fr->n_cols();
int nrow = fr->n_rows();
int ntotal = nrow*ncol;
arma::Mat<double> mat = fr->get_data();
double * ptr = mat.memptr();
vector<double> vec(ptr, ptr + ntotal);
//copy arma::Mat
timer.step("start");
arma::Mat<double> d1 = mat;
timer.step("copy(arma)");
//copy vector
vector<double> d2 = vec;
timer.step("copy(vec)");
//arma::Mat to NumericMatrix
NumericMatrix res1 = wrap(mat);
timer.step("arma->Rcpp");
//vector to NumericMatrix
NumericMatrix res2(nrow, ncol, vec.begin());
timer.step("vec-->Rcpp");
> a <- test_getData(cf1@pointer)
> diff(a)/1e6
copy(arma) copy(vec) arma->Rcpp vec-->Rcpp
188.7736 186.6087 198.9434 206.8545
The answer is no
We were thinking about allocate mem in R and pass it back to cytolib to fill the content so that the copying from cytolib to R can be avoided. i.e
timer.step("start"); // record the starting point
//pre-allocate mem in R (can't use NumericMatrix(n,m) constructor since it is slow)
SEXP dbls = PROTECT(Rf_allocVector(REALSXP, ntotal));
//create arma::Mat to point to the same mem
double * ptr = REAL(dbls);
arma::Mat<double> mat(ptr, nrow, ncol, false);
timer.step("Rf_allocVector"); // record the first step
//fill the mat from cytoframe
fr->get_data(mat);
timer.step("fr->get_data(mat)");
UNPROTECT(1);
//convert sexp to rcpp numericmatrix
NumericVector mat1(dbls);
mat1.attr("dims") = Dimension(nrow, ncol);
> a <- test_getData(cf1@pointer)
> diff(a)/1e6
Rf_allocVector fr->get_data(mat)
0.022846 187.282139
And it does eliminate the overhead from Rcpp::wrap(mat)
call.
However, this only works for unindexed cytoframeView
where the matrix size is consistent between the view and underlying cytoframe
object. Otherwise, especially for the row-indexed view, because h5
lib is slow on hyberslab
selection, H5CytoFrame
needs to always read in all rows and then does row-selection afterwards. Therefore this workflow won't work because R
only sees the view and can't allocate the right-size of memory for h5
reading.
Looking at the source of Rf_allocVector
https://github.com/wch/r-source/blob/trunk/src/main/memory.c#L2606
Probably it is not a good idea to force the memory pointer into R without going through its own allocator.
While @jacobpwagner is investigating
H5FD_CORE
(in memory H5 for faster IO, which potentially can replace the existingMemCytoFrame
to simplify/unify the code base forCytoFrame
), he discovered there is no significant performance advantage by keeping data in memory based on the benchmarking at R levelIt shows very minor speedup for
in-memory
version ofcytoframe
when reading the whole dataset. For the partial IO, on-disk version is even better.We will have to further profile code to see where exactly slows down
in-memory
cytoframe, otherwise we have no reason to keep it.