devmax / randomforest-matlab

Automatically exported from code.google.com/p/randomforest-matlab
0 stars 0 forks source link

NaN data #67

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I would just like to know how the model handles or needs to handle NaN data?

I have NaN observations in my X_train set. I am performing a regression 
analysis.

Regards

Original issue reported on code.google.com by nikhil.h...@gmail.com on 3 Jan 2015 at 7:26

GoogleCodeExporter commented 9 years ago
I was getting the following error:

Warning: Do you want regression? there are just 5 or less unique values 
> In regRF_train at 163 
Error using regRF_train (line 176)
NaNs in X

Original comment by nikhil.h...@gmail.com on 3 Jan 2015 at 7:29

GoogleCodeExporter commented 9 years ago
this implementation cannot handle NaN data. you can try imputing the values

https://code.google.com/p/randomforest-matlab/wiki/Finding_Missing_Values

regards

Original comment by abhirana on 3 Jan 2015 at 11:38

GoogleCodeExporter commented 9 years ago
Ok. Thank you.

Could you please point me in the direction of how and where it is stated in 
literature that 500 trees are the most stable for random forests, and that for 
regression the minimum leaf size is 5?

Original comment by nikhil.h...@gmail.com on 5 Jan 2015 at 3:05

GoogleCodeExporter commented 9 years ago
500 trees are not the most stable. they are a good enough number of trees after 
which you might find the oob error rate stabilizing

if i remember correctly, ntree and the minimum leaf size=5 for regression is 
suggested by Breiman in his paper on random forests.

Original comment by abhirana on 5 Jan 2015 at 6:23