aloysius-lim / bigrf

Random forests for R for large data sets, optimized with parallel tree-growing and disk-based memory
91 stars 26 forks source link

Issue with your example #20

Open t6166as opened 7 years ago

t6166as commented 7 years ago

When I tried to created second tree, I get following error:

forest2 <- bigrfc(x[1:60, ], y[1:60], ntree=50L, varselect=vars) Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type, : Backing file already exists! Either remove or specify different backing file

What is backing file?

ajnisbet commented 7 years ago

The backing file is a temporary file used to store data that won't fit in memory. There seems to be code to clear these files after fitting a tree (grow.R) but it for me it doesn't get called, and bigrf will complain if it finds a file there.

A workaround is to pass either random or different backing file directories for each forest, clearing them manually:

tmp.dir.forest1 <- '/tmp/Rforest1'
unlink(tmp.dir.forest1, recursive = TRUE)
forest1 <- bigrfc(x, y, cachepath=tmp.dir.forest1)