Open sslavian812 opened 6 years ago
Hey, hmmm....getting the largest possible value that the forest can predict is a tricky number to compute. For a single tree, this is trivial to compute, but for the forest things are more complicated. It's quite likely that the highest prediction of individual trees do not occur for the same input (due to the randomization during training). It is not ever guaranteed that the highest prediction is for any of these inputs. Maybe I misunderstand what you are trying to do, so could you please elaborate on what you mean by "max possible value" that the forest can predict?
Regarding you questions:
valuex
come from the serialization library I use, and carry no meaning. You can reconstruct the object from the string though, but it's hopeless to try to find all the leafs from that representation. response_stat
to compute the largest output for a tree, but as I argued above, that is not necessarily the true global maximum.Hope that helps.
@sfalkner Thank you for the quick reaction!
I was trying to get a max possible value for each tree in the forest and then combine that values somehow. Maybe as calculating average or some other way.
I want to be able to compare two random forests in terms of "which one will probably give me the bigger value for a next (random) object".
Here is why: I'm experimenting with https://github.com/automl/SMAC3 hyperparameter optimization for my research work. I have a few running smac instances, and on each step want to choose one of them to run. I want to use that smac, which underlying random forest will potentially give me bigger EI value. Thus, I'm investigating the random forest internals :)
It would be even better, I i could to find, which objects tend to get maximum value out of the random forest, but it's another story. I'was thinking about this post: https://stats.stackexchange.com/questions/205145/find-max-value-of-random-forest-regressor-output
So you are actually looking for the the maximum EI value possible? For that you need the mean and variance prediction of the forest which involve all trees. Furthermore, EI is a non-stationary quantity, meaning the values you can achieve change over time. For a model with a lot of points that is certain about the optimum, the EI value are much smaller compared to a model trained with less data. So do be careful with comparing those.
Thank you for the information.
I added some methods, which will calculate the largest output for each tree for me. https://github.com/sslavian812/random_forest_run/commit/e8f8d3caa483cbd338d623a9f0e65e3bfbb89c51
I'm new to python and totally new to wrapping c++ code into python interface with swig.
I hoped, that if I install pyrfr
from the repository, the interfaces will be rebuilt upon installation.
pip install git+git://github.com/sslavian812/random_forest_run.git
Gives me an error
Collecting git+git://github.com/sslavian812/random_forest_run.git
Cloning git://github.com/sslavian812/random_forest_run.git to /tmp/pip-nqd62n11-build
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "..../lib/python3.4/tokenize.py", line 438, in open
buffer = _builtin_open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-nqd62n11-build/setup.py'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-nqd62n11-build/
Do you have an idea what am I missing?
Is there some quick way to install an experimental version of pyrfr
?
There is, but it is a bit cumbersome. You will need cmake, doxygen and boost (a C++ library) then go into the git repo and execute the following commands:
mkdir build
cd build
cmake ..
make pyrfr_docstrings
cd python_package
pip install . --user
You will have to repeat this every time you change something in the C++ code to build the python package. Wrapping your functionality should work out of the box and you should have access to them without doing anything else. Let me know if you have anymore troubles.
Thank you, it did the trick! I'd be more than happy to make a pull request, if you consider this feature useful.
Hi! I'm working on a research experiment, which requires from me to get the "max possible value", that the regression forest of yours can predict at the moment.
I've been trying to get that value using python api. I use RandomForestWithInstances in python.
It seems, there is no way I can get something like "all nodes from all trees", other than serializing the RandomForest to string or to tex and reading it.
model.rf.ascii_string_representation()
gives me something like that.Click to expand string representation. . .
``` { "value0": { "value0": 10, "value1": 9, "value2": true, "value3": false, "value4": { "value0": 5, "value1": 20, "value2": 3, "value3": 2.0, "value4": 3, "value5": 1.0, "value6": 1048576, "value7": 1e-8, "value8": 1000.0, "value9": false } }, "value1": [ { "value0": [ { "value0": [], "value1": [], "value2": 0, "value3": { "value0": 1, "value1": 2 }, "value4": { "value0": 0.5555555555555556, "value1": 0.4444444444444444 }, "value5": { "value0": 4, "value1": 0.41871805715741286, "value2": { "type": 0, "data": 0 } }, "value6": { "value0": 0.0, "value1": 0.0, "value2": { "value0": 0, "value1": 0.0, "value2": 0.0 } } }, // some more here ... ], "value1": 2, "value2": 1 } ], "value2": 6, "value3": [], "value4": NaN, "value5": [ 3, 0, 0, 0, 0, 0 ], "value6": [ { "value0": 3.0, "value1": NaN }, { "value0": 0.0, "value1": 1.0 }, { "value0": 0.0, "value1": 1.0 }, { "value0": 0.0, "value1": 1.0 }, { "value0": 0.0, "value1": 1.0 }, { "value0": -Infinity, "value1": Infinity } ] } ```What this
value0
...value6
are supposed to mean? I'm totally confused.I tried to examine the c++ code from this repository, but
InputArchive
,JSONInputArchive
, and other weird template structures seem to complicated and messy for me at the moment.As far as a understood, I need to get
std::vector<node_type> the_nodes;
from thek_ary_random_tree
somehow, and then getrfr::util::weighted_running_statistics<num_t> response_stat;
from each node (k_ary_node
, I suppose).Can you please help me with this issue?
It can become a contribution to the code base, if I could understand how things work here.