RGF-team / rgf

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.
379 stars 58 forks source link

dump RGF and FastRGF to the JSON file #167

Open StrikerRUS opened 6 years ago

StrikerRUS commented 6 years ago

Initial support for dumping the RGF model is already implemented in #161. At present it's possible to print the model to the console. But it's good idea to bring the possibility of dumping the model to the file (e.g. JSON).

@StrikerRUS:

Really like new features introduced in this PR. But please think about "real dump" of a model. I suppose it'll be more useful than just printing to the console.

@fukatani:

For example dump in JSON format like lightGBM. It's convenient and we may support it in the future, but we should do it with another PR.

StrikerRUS commented 6 years ago

@fukatani Are there any progress with real dump to JSON?

fukatani commented 6 years ago

Unfortunately I think that I can not get out for a while. It is likely to take time for RGF on LightGBM. I am stuck.

StrikerRUS commented 6 years ago

Oh, OK. I just wondered 😃 .

Then I can start refining the repo structure, so you'll be needed only for reviews.

StrikerRUS commented 6 years ago

Might be useful as an example:

https://github.com/Microsoft/LightGBM/blob/614f69d4deb491863012ebb8c6282e15b3c087c0/src/boosting/gbdt_model_text.cpp#L15 https://github.com/Microsoft/LightGBM/blob/dc6995742a5284a1e942978e2542fc49adda9ea1/src/io/tree.cpp#L244 https://github.com/Microsoft/LightGBM/blob/dc6995742a5284a1e942978e2542fc49adda9ea1/src/io/tree.cpp#L259

StrikerRUS commented 6 years ago

It seems that FastRGF already has this functionality: https://github.com/RGF-team/rgf/blob/676f6b78eb24e191f65fd713ea739436809826e3/FastRGF/src/exe/forest_predict.cpp#L28

StrikerRUS commented 6 years ago

What do you think about the following JSON scheme for RGF model?

Code for the model:

import lightgbm as lgb
import numpy as np
from sklearn.datasets import load_iris
from rgf import RGFClassifier

data = load_iris()

clf = RGFClassifier()

clf.fit(data.data, data.target)

clf.dump_model()

The beginning of the model:

"dump_model": 
   model_fn=D:\rgf\temp\26cf8e70-a394-4982-8169-e3d2f66be1941.model-10
Sat Jul 28 02:36:18 2018: Dump model ... 
constant=0, orgdim=4, #tree=500
tree[0]

[  0], depth=0, gain=85.2273, F2, 2.45
  [  1], (0.0111), depth=1, gain=0
  [  2], (-0.0123), depth=1, gain=0
tree[1]

[  0], depth=0, gain=23.5202, F2, 2.45
  [  1], (0.0111), depth=1, gain=0
  [  2], (-0.0122), depth=1, gain=0
tree[2]

[  0], depth=0, gain=9.90365, F2, 2.45
  [  1], (0.0111), depth=1, gain=0
  [  2], (-0.0122), depth=1, gain=0
tree[3]

[  0], depth=0, gain=5.03139, F2, 2.45
  [  1], (0.0111), depth=1, gain=0
  [  2], (-0.0122), depth=1, gain=0
tree[4]

JSON sample:

{
  "num_forests": "3",
  "forests": [
    {
      "num_trees": "500",
      "trees": [
        {
          "tree_index": "0",
          "nodes": [
            {
              "node_index": "0",
              "is_leaf": "False",
              "depth": "0",
              "gain": "85.2273",
              "feature": "F2",
              "threshold": "2.45",
              "value": "NA"
            },
            {
              "node_index": "1",
              "is_leaf": "True",
              "depth": "1",
              "gain": "0",
              "feature": "NA",
              "threshold": "NA",
              "value": "0.0111"
            },
            {
              "node_index": "2",
              "is_leaf": "True",
              "depth": "1",
              "gain": "0",
              "feature": "NA",
              "threshold": "NA",
              "value": "-0.0123"
            }
          ]
        },
        {
          "tree_index": "1",
          "nodes": [
            {
              "node_index": "0",
              "is_leaf": "False",
              "depth": "0",
              "gain": "23.5202",
              "feature": "F2",
              "threshold": "2.45",
              "value": "NA"
            },
            {
              "node_index": "1",
              "is_leaf": "True",
              "depth": "1",
              "gain": "0",
              "feature": "NA",
              "threshold": "NA",
              "value": "0.0111"
            },
            {
              "node_index": "2",
              "is_leaf": "True",
              "depth": "1",
              "gain": "0",
              "feature": "NA",
              "threshold": "NA",
              "value": "-0.0122"
            }
          ]
        }
      ]
    },
    {
      "num_trees": "264",
      "trees": []
    },
    {
      "num_trees": "349",
      "trees": []
    }
  ]
}

https://jsoneditoronline.org/?id=5da3e57a6ae24cb5a6b583c8f41dd307