Saving and Loading Boosters

gyansinha commented 11 months ago

I'd like to save and transport a trained booster to another machine (estimation on linux, prediction on windows). I know the save function writes it out to a json, like this:

bst         = xgboost(dtrain;  watchlist=watchlist, params...)
model_fname = "xgboost_$(target)_$(replace(string(now()), ":" => "_"))_model.json"
XGBoost.save(bst, model_fname)

How do I load the model json file on the other machine without transporting the dtrain object as well? All of the arguments to the load (or load!) as well as the Booster function expect a DMatrix (or Array)? This behaviour seems anomalous to how this is supposed to work in say, Python or R wrappers.

Thanks for your help.

gyansinha commented 11 months ago

FWIW, the example snippet from tests:

model_fname = "model.json"
bst2 = Booster(DMatrix[])
XGBoost.load!(bst2, model_fname)

results in an empty Booster when I load the JSON downloaded from Linux to Windows.

ExpandingMan commented 11 months ago

You should be able to load models saved with XGBoost.save with XGBoost.load. I was under the impression that save would dump the object in a binary format, not a JSON.

Note also that whether or not the exported model is compatible across versions is up to libxgboost, and I don't know what compatibility it claims, you can try checking the docs. Therefore, if you are still having problems, you should at least verify that the model exported with save can be retrieved with load on the same runtime. If that does not work it may be a bug. If it works within the same program, but not when exported to windows, it is likely that libxgboost doesn't support exports across subtly different versions.

gyansinha commented 11 months ago

I went through the elimination process and the save/load procedure does work properly on the same runtime. It doesn't work properly when transporting from linux to windows - e.g., the booster remains empty after the load! command.

ExpandingMan commented 11 months ago

The silent failure doesn't seem like good behavior to me, it might be worth opening an issue at xgboost. I was not able to find anything in the documentation about what it considers a valid version difference, and I am confused by the fact that there seem to be 3 different methods for dumping a model, only one of which is explicitly documented as not used for re-constructing the model.

I am re-opening this because this should not silently fail, it should either work or throw an error. As of right now we can't implement that in XGBoost.jl because I don't even know how to check whether libxgboost thinks it's a compatible version.

ExpandingMan commented 11 months ago

Also, if the xgboost version is exactly the same between the linux and windows machines, this might be a windows-specific bug.

If it's working in the Python and R wrappers, it ought to work here, I just don't know what's going on without knowing what those wrappers are doing.

gyansinha commented 11 months ago

Actually I did some more digging and am no longer confident the save/load! works in the linux runtime with an empty DMatrix. For example:

bst         = xgboost(dtrain;  watchlist=watchlist, params...)
model_fname = "../models/xgboost/xgboost_$(target)_$(replace(string(now()), ":" => "_"))_model.json"
XGBoost.save(bst, model_fname)

model_booster_2 = XGBoost.load!(XGBoost.Booster(DMatrix[]), model_fname)
XGBoost.load!(model_booster_2, model_fname)

julia> model_booster_2.feature_names
String[]

This works:

model_booster = XGBoost.Booster(dtest);
julia> model_booster.feature_names
108-element Vector{String}:
....

Unless I am missing something, this looks like a bug, or I am just using the functions the wrong way.

ExpandingMan commented 11 months ago

The feature names are expected not to be saved (this is something that should be documented but isn't). This is because the libxgboost model object doesn't support them, so they have to be saved in the Julia object, and we don't write this to disk because then we'd have to come up with our own (incompatible) file format. The python wrapper documents that it behaves the same way for the same reason. The only reliable way that I'm aware of to check if the model object is being properly loaded is to check the output of predict. This is unfortunate, but it's a constraint of the library we are wrapping.

ExpandingMan commented 11 months ago

@trivialfis , is there anything you can tell us about how we should be checking for this? Are we safe in assuming that if it loads without error then the library thinks it's a valid object? If that's the case there is likely a bug somewhere since @gyansinha couldn't load the object in windows, though I can't rule out that it's a windows-specific bug in the Julia wrapper.

gyansinha commented 11 months ago

The feature names are expected not to be saved (this is something that should be documented but isn't). This is because the libxgboost model object doesn't support them, so they have to be saved in the Julia object, and we don't write this to disk because then we'd have to come up with our own (incompatible) file format. The python wrapper documents that it behaves the same way for the same reason. The only reliable way that I'm aware of to check if the model object is being properly loaded is to check the output of predict. This is unfortunate, but it's a constraint of the library we are wrapping.

If I try to use model_booster_2 with the dtest matrix, it runs without error - but returns zeros. That tells me that somewhere the prediction vector is allocated with zeros but then never properly assigned to.

gyansinha commented 11 months ago

@trivialfis , is there anything you can tell us about how we should be checking for this? Are we safe in assuming that if it loads without error then the library thinks it's a valid object? If that's the case there is likely a bug somewhere since @gyansinha couldn't load the object in windows, though I can't rule out that it's a windows-specific bug in the Julia wrapper.

@ExpandingMan note the last round of tests is on the linux runtimes for both save and load!.

gyansinha commented 11 months ago

It is failing on linux too, unless I initialize the booster with a dtrain or dtest matrix, and then use load!. in which case I get valid predictions. If I initialize an empty booster (e.g. with a DMatrix{}), and then try to load! , the predictions are all 0. I also wanted to eliminate the save part as a problem and read the json files into python xgboost and ran things like plot_importance on the loaded booster - they all seemed to work fine.

On Wed, Aug 30, 2023 at 6:51 PM ExpandingMan @.***> wrote:

Is it failing on linux though? I thought it was only not giving you the feature names and parameter values on linux, which is expected.

— Reply to this email directly, view it on GitHub https://github.com/dmlc/XGBoost.jl/issues/189#issuecomment-1699956238, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVPFJD54DJAXB3N6GTP5DDXX67XTANCNFSM6AAAAAA4EN3PMU . You are receiving this because you were mentioned.Message ID: @.***>

ExpandingMan commented 11 months ago

I can't reproduce any problem with this on linux. Are you sure you are on latest? (XGBoost.jl 2.3.2 and XGBoost_jll 1.7.6)

gyansinha commented 11 months ago

I can't reproduce any problem with this on linux. Are you sure you are on latest? (XGBoost.jl 2.3.2 and XGBoost_jll 1.7.6)

that was it! I was on 2.3.1 on linux, just upgraded to 2.3.2 and the load with an empty DMatrix Booster works fine. Also tested the model replication on windows and predictions all work out correctly. Sorry for the confusion but we can close this.

dmlc / XGBoost.jl

Saving and Loading Boosters #189