aloysius-lim / bigrf

Random forests for R for large data sets, optimized with parallel tree-growing and disk-based memory
91 stars 26 forks source link

Avoid to convert x to big.matrix in prediction? #14

Closed adolfoalvarez closed 9 years ago

adolfoalvarez commented 9 years ago

Hello ! I have a question regarding prediction. Do you think is possible to avoid the "convert to big matrix" step on prediction? You say in the code that this is a C requirement, but can be great to have it as option.

I am asking because the prediction step takes long time even when you want to predict a single observation because of this big.matrix conversion.

Thank you very much for this amazing package, best regards,

aloysius-lim commented 9 years ago

Hi, thanks for your question. Did you time the execution of predict() and isolate the issue down to the big.matrix conversion? The conversion should be instantaneous if you are supplying only a few observations. There are many other factors to the execution time of predict(), such as the number of trees in your forest, and the number of nodes in each tree.

adolfoalvarez commented 9 years ago

Hi, you are right this is a forest with 50 trees, and the prediction of a single row of 5 or 6 columns took around 6 minutes, so number of trees can be also the reason. For now I am just based on the output reported when trace =1, where the "converting to big matrix " message take most of the time, and my previous experience with the bigmatrix package. Nevertheles, when I will be back at work in few days II will try it a more formal measure with the profr package, and maybe provide a reproducible example.

Thanks for your fast answer! Best regards, Adolfo.