dmlc / treelite

Universal model exchange and serialization format for decision tree forests
https://treelite.readthedocs.io/en/latest/
Apache License 2.0
738 stars 100 forks source link

Revamp JSON importer to make it easy to use #517

Open hcho3 opened 1 year ago

stephenpardy commented 2 months ago

I am very interested in this issue and understanding what progress there is here.

I see the PR adding JSON importing in the C and python API: https://github.com/dmlc/treelite/pull/448, but it seems like this was since removed. There is still the ability to dump_as_json from any model, but are there any utilities to load these files back? I think my question may be a duplicate of #11 but there seems to have been a lot of development since that issue was closed.

hcho3 commented 2 months ago

@stephenpardy

what progress there is here.

I didn't get around writing the JSON importer yet, because I wasn't sure what kind of interface would be the best for the JSON importer. The last iteration (import_from_json from Treelite 3.9) was clunky to use and had many gotchas. Also, for the JSON importer, it is not as simple as using the output of dump_as_json function, since the output doesn't contain some bits of information that are necessary to preserve the integrity of the model through a round-trip serialization.

Can you describe what your use case would be? I'd like to learn how you plan to use the JSON importer so that I can pick the best design.

stephenpardy commented 2 months ago

@hcho3 I am looking for a way to load tree models from a variety of sources - e.g. xgboost, lightGBM, etc. and then save those models in a stable way that can be loaded and served at a later time.

I see the serialize and deserialize methods which seem to meet my needs - and there is even some nice backwards compatibility promised by the docs. I think that is enough for now, but having a human-readable format such as JSON would be much preferred over the binary one if possible (similar to how xgboost now defaults to JSON over the old binary one).