Open StrikerRUS opened 6 years ago
@fukatani Are there any progress with real dump to JSON?
Unfortunately I think that I can not get out for a while. It is likely to take time for RGF on LightGBM. I am stuck.
Oh, OK. I just wondered 😃 .
Then I can start refining the repo structure, so you'll be needed only for reviews.
Might be useful as an example:
https://github.com/Microsoft/LightGBM/blob/614f69d4deb491863012ebb8c6282e15b3c087c0/src/boosting/gbdt_model_text.cpp#L15 https://github.com/Microsoft/LightGBM/blob/dc6995742a5284a1e942978e2542fc49adda9ea1/src/io/tree.cpp#L244 https://github.com/Microsoft/LightGBM/blob/dc6995742a5284a1e942978e2542fc49adda9ea1/src/io/tree.cpp#L259
It seems that FastRGF already has this functionality: https://github.com/RGF-team/rgf/blob/676f6b78eb24e191f65fd713ea739436809826e3/FastRGF/src/exe/forest_predict.cpp#L28
What do you think about the following JSON scheme for RGF model?
Code for the model:
import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_iris
from rgf import RGFClassifier
data = load_iris()
clf = RGFClassifier()
clf.fit(data.data, data.target)
clf.dump_model()
The beginning of the model:
"dump_model":
model_fn=D:\rgf\temp\26cf8e70-a394-4982-8169-e3d2f66be1941.model-10
Sat Jul 28 02:36:18 2018: Dump model ...
constant=0, orgdim=4, #tree=500
tree[0]
[ 0], depth=0, gain=85.2273, F2, 2.45
[ 1], (0.0111), depth=1, gain=0
[ 2], (-0.0123), depth=1, gain=0
tree[1]
[ 0], depth=0, gain=23.5202, F2, 2.45
[ 1], (0.0111), depth=1, gain=0
[ 2], (-0.0122), depth=1, gain=0
tree[2]
[ 0], depth=0, gain=9.90365, F2, 2.45
[ 1], (0.0111), depth=1, gain=0
[ 2], (-0.0122), depth=1, gain=0
tree[3]
[ 0], depth=0, gain=5.03139, F2, 2.45
[ 1], (0.0111), depth=1, gain=0
[ 2], (-0.0122), depth=1, gain=0
tree[4]
JSON sample:
{
"num_forests": "3",
"forests": [
{
"num_trees": "500",
"trees": [
{
"tree_index": "0",
"nodes": [
{
"node_index": "0",
"is_leaf": "False",
"depth": "0",
"gain": "85.2273",
"feature": "F2",
"threshold": "2.45",
"value": "NA"
},
{
"node_index": "1",
"is_leaf": "True",
"depth": "1",
"gain": "0",
"feature": "NA",
"threshold": "NA",
"value": "0.0111"
},
{
"node_index": "2",
"is_leaf": "True",
"depth": "1",
"gain": "0",
"feature": "NA",
"threshold": "NA",
"value": "-0.0123"
}
]
},
{
"tree_index": "1",
"nodes": [
{
"node_index": "0",
"is_leaf": "False",
"depth": "0",
"gain": "23.5202",
"feature": "F2",
"threshold": "2.45",
"value": "NA"
},
{
"node_index": "1",
"is_leaf": "True",
"depth": "1",
"gain": "0",
"feature": "NA",
"threshold": "NA",
"value": "0.0111"
},
{
"node_index": "2",
"is_leaf": "True",
"depth": "1",
"gain": "0",
"feature": "NA",
"threshold": "NA",
"value": "-0.0122"
}
]
}
]
},
{
"num_trees": "264",
"trees": []
},
{
"num_trees": "349",
"trees": []
}
]
}
https://jsoneditoronline.org/?id=5da3e57a6ae24cb5a6b583c8f41dd307
Initial support for dumping the RGF model is already implemented in #161. At present it's possible to print the model to the console. But it's good idea to bring the possibility of dumping the model to the file (e.g. JSON).
@StrikerRUS:
@fukatani: