aloysius-lim / bigrf

Random forests for R for large data sets, optimized with parallel tree-growing and disk-based memory
91 stars 26 forks source link

Building a Windows Version Using Rtools.exe #13

Closed hack-r closed 9 years ago

hack-r commented 9 years ago

My organization uses Windows Server for my R app server, thus I was disappointed to realize that bigrf currently only runs on *nix due to some dependency issue.

I asked some folks on IRC if there was a work-around or close alternative and they told me I should compile my own Windows copy of bigrf using rtools.exe.

This would be the first time I've been through that process and I think I found a tutorial with relevant information on how to do so, but then it struck me that if it were that simple you'd already have done it and published the Windows version on CRAN.

Could you tell me if I'm correct to think this? Getting a Windows version by somehow using Rtools.exe would not be straightforward?

grantbrown commented 9 years ago

I believe the limitation is that the bigmemory dependency does not currently support windows, due to an issue with boost header files, so even if you remove the "OS Type" option from the description file and try to install the package with RTools, you won't meet the required dependencies.

Would your application fit into RAM? If so, you shouldn't have any issues using the RandomForest package.

hack-r commented 9 years ago

Thanks for the insight about the bigmemory issue.

Would it fit into RAM? Definitely on the server, though perhaps not in use on my local laptop. Thanks again!

aloysius-lim commented 9 years ago

@grantbrown is right, the platform limitation is due to the lack of windows support in the bigmemory package.

hack-r commented 9 years ago

@grantbrown Looks like it's going to be a bit more complicated than that to get it working under Windows.

I've started a fork where I began with your suggestion. I ran into another error, fixed it, did a commit, found another bug, etc. I'll keep iterating on it until I get it working under Windows...

https://github.com/hack-r/bigrf

grantbrown commented 9 years ago

@hack-r, best of luck, but I don't think your approach of just ripping out bigmemory components is going to work - I'm pretty sure you'll need to re-implement the BigMatrix API using either regular memory allocation (in which case you'd be better off just using the RandomForest package), or a custom file based solution, which is a non-trivial development effort.

hack-r commented 9 years ago

@grantbrown Thanks and you're right. It became clear that it was doomed when I saw how huge of a role bigmemory plays in the package. The website for bigmemory has as statement that it's going to be re-implemented for Windows soon, but no details and the last update date of that page was last September...

"Last Updated September 7, 2014

News

We are close to updating bigmemory with restored support for Windows. We are also in the     process of relocating this site from Google Pages (where some mysterious problem was never solved). Please bear with us through these transitions!"

If we have faith that bigmemory will start working on Windows again soon then I think we should wait for that solution. If we don't then maybe it would be worth it to work on an entirely different approach to scale up randomForest on Windows, with the other available HPC tools in R that work in Windows (?), which could be made into another package (hopefully if I went this route and put a couple hours per week into it some other, smarter people would contribute to the package development as well and we could get it up and running pretty fast).

Any thoughts on that?

grantbrown commented 9 years ago

@hack-r, it seems like an easier approach would be to examine the source of the bigmemory project to see what challenges remain w.r.t. Windows support, and perhaps put together a subset of boost source files which are needed to build it. I too ran into issues with boost on Windows when developing libspatialSEIR, but was able to put together a subset of their source files which was sufficient for my library.

In any case, I likely won't personally be able to contribute much until late Summer due to other commitments.