Closed kudkudak closed 10 years ago
What is your current approach for R interface? I would be grateful for your opinion - as we are having the same debate now in SVM :) @crazySocket
There are two things to know before I answer. 1) It is easy to create a function in c++ and port it to R. 2) you can port whole class into R (constructor and methods).
Now, the approach is to prepare the factory function CEC() which is capable of creating instances of CEC class. The function will require all arguments to be keywords like in example CEC(dataset=data, killThreshold=0.001, clusters=c(c(type="spherical",radius=2), c(type="covMat", covMat=m))). CEC class will be ported to R, however its constructor will not. The idea is that you can use the object to plot data or change clustering and rerun algorithm (i.e. cec$draw() ). The constructor must not be ported because the limit for number of arguments is 6 and keywords are unavailable.
However, there is one difficulty. CEC instance requires pointer to dataset in order not to copy the potentially huge dataset. The pointer is somewhat "extracted" from SEXP args
which is the argument of the factory function. The life span of args is shorter than CEC instance. That might be a problem.
Thanks. I agree that the object has to be possible to be rerun, or drawn, but you can achieve it without using this class exporting feature, but of course it would be nicer to do it this way. (I did it in GNG project, but don't try reading the source as it is non-readable ;), I can prepare an example if needed)
I do not understand what is the problem with pointer? Pointer is not an object, but unsigned long int, so you should just cast it to pointer type and it won't get deleted?
According to code: RcppExport SEXP run(SEXP args) { Rcpp::List list(args); Rcpp::NumericMatrix proxyDataset = //here it is extracted from list arma::mat points(proxyDataset.begin(), proxyDataset.nrow(), proxyDataset.ncol(), false); //this way you create arma::mat without coping memory - memory is reused. return CEC(...) }
you can create CEC and points in heap that is not the issue. The problem is CEC outlives args. Since I don't know internals of Rcpp I assume this approach may cause trouble.
I guess you should create proxyDataset on heap, and likewise any other argument. In RCPP anything on the heap will live until the R session is terminated. You can even run threads. It is all achieved because R itself is an C++ program. If it doesn't answer this problem, we can talk in person about it, because I think it might be hard to solve it here.
To sum up - we will try using C++ class exporting, hopefully it will workout ( @igorsieradzki )
Check for R <-> C++ integration options. There essentially two options:
One reference: http://dirk.eddelbuettel.com/code/rcpp/Rcpp-modules.pdf
Personally I think using exposed C++ class might be less work