gmum / gmum.r

GMUM machine learning group R package
http://r.gmum.net
Other
33 stars 10 forks source link

Research - R interface #58

Closed kudkudak closed 10 years ago

kudkudak commented 10 years ago

Check for R <-> C++ integration options. There essentially two options:

* standalone functions (C-like)

* exposed C++ class

One reference: http://dirk.eddelbuettel.com/code/rcpp/Rcpp-modules.pdf

Personally I think using exposed C++ class might be less work

kudkudak commented 10 years ago

What is your current approach for R interface? I would be grateful for your opinion - as we are having the same debate now in SVM :) @crazySocket

crazySocket commented 10 years ago

There are two things to know before I answer. 1) It is easy to create a function in c++ and port it to R. 2) you can port whole class into R (constructor and methods).

Now, the approach is to prepare the factory function CEC() which is capable of creating instances of CEC class. The function will require all arguments to be keywords like in example CEC(dataset=data, killThreshold=0.001, clusters=c(c(type="spherical",radius=2), c(type="covMat", covMat=m))). CEC class will be ported to R, however its constructor will not. The idea is that you can use the object to plot data or change clustering and rerun algorithm (i.e. cec$draw() ). The constructor must not be ported because the limit for number of arguments is 6 and keywords are unavailable.

However, there is one difficulty. CEC instance requires pointer to dataset in order not to copy the potentially huge dataset. The pointer is somewhat "extracted" from SEXP args which is the argument of the factory function. The life span of args is shorter than CEC instance. That might be a problem.

kudkudak commented 10 years ago

Thanks. I agree that the object has to be possible to be rerun, or drawn, but you can achieve it without using this class exporting feature, but of course it would be nicer to do it this way. (I did it in GNG project, but don't try reading the source as it is non-readable ;), I can prepare an example if needed)

I do not understand what is the problem with pointer? Pointer is not an object, but unsigned long int, so you should just cast it to pointer type and it won't get deleted?

crazySocket commented 10 years ago

According to code: RcppExport SEXP run(SEXP args) { Rcpp::List list(args); Rcpp::NumericMatrix proxyDataset = //here it is extracted from list arma::mat points(proxyDataset.begin(), proxyDataset.nrow(), proxyDataset.ncol(), false); //this way you create arma::mat without coping memory - memory is reused. return CEC(...) }

you can create CEC and points in heap that is not the issue. The problem is CEC outlives args. Since I don't know internals of Rcpp I assume this approach may cause trouble.

kudkudak commented 10 years ago

I guess you should create proxyDataset on heap, and likewise any other argument. In RCPP anything on the heap will live until the R session is terminated. You can even run threads. It is all achieved because R itself is an C++ program. If it doesn't answer this problem, we can talk in person about it, because I think it might be hard to solve it here.

To sum up - we will try using C++ class exporting, hopefully it will workout ( @igorsieradzki )