JuliaAI / MLJModels.jl

Home of the MLJ model registry and tools for model queries and mode code loading
MIT License
80 stars 27 forks source link

lazy activation of models not working from within packages #22

Closed rssdev10 closed 3 years ago

rssdev10 commented 5 years ago

I'm trying to make a module with MLJModel:

module Abc

import XGBoost: dump_model, save, Booster

using MLJ
using MLJBase
import MLJModels

using MLJModels.XGBoost_

function __init__()
    @info "Abc"
end

end

but having an error ERROR: LoadError: UndefVarError: XGBoost_ not defined.

Looks like there is an issue with lazy activation in

@require XGBoost = "009559a3-9522-5dbb-924b-0b6ed2b22bb9" include("XGBoost.jl")

One workaround I found is

module Abc

using XGBoost
#import XGBoost: dump_model, save, Booster

using MLJ
using MLJBase
import MLJModels

include(joinpath(MLJModels.srcdir, "XGBoost.jl"))
#using MLJModels.XGBoost_

function __init__()
    @info "Abc"
end

end

Also I added debug output into function __init__ of the module MLJModels and I see that this method is called twice. I have something like:

[ Info: Precompiling Abc [top-level]
[ Info: MLJModels!!!
[ Info: MLJModels!!!
[ Info: Abc

May be it is related to a chain of __init__ methods.

ablaom commented 5 years ago

I'm afraid I cannot reproduce your problem:

julia> module Abc

       import XGBoost: dump_model, save, Booster

       using MLJ
       using MLJBase
       import MLJModels

       using MLJModels.XGBoost_

       function __init__()
           @info "Abc"
       end

       end
[ Info: Recompiling stale cache file /Users/anthony/.julia/compiled/v1.1/XGBoost/rSeEh.ji for XGBoost [009559a3-9522-5dbb-924b-0b6ed2b22bb9]
[ Info: Abc
Main.Abc

julia> using MLJ

julia> task = load_boston()
SupervisedTask @ 5…85

julia> model = Abc.XGBoostRegressor()
MLJModels.XGBoost_.XGBoostRegressor(num_round = 1,
                                    booster = "gbtree",
                                    disable_default_eval_metric = 0,
                                    eta = 0.3,
                                    gamma = 0.0,
                                    max_depth = 6,
                                    min_child_weight = 1.0,
                                    max_delta_step = 0.0,
                                    subsample = 1.0,
                                    colsample_bytree = 1.0,
                                    colsample_bylevel = 1.0,
                                    lambda = 1.0,
                                    alpha = 0.0,
                                    tree_method = "auto",
                                    sketch_eps = 0.03,
                                    scale_pos_weight = 1.0,
                                    updater = "grow_colmaker",
                                    refresh_leaf = 1,
                                    process_type = "default",
                                    grow_policy = "depthwise",
                                    max_leaves = 0,
                                    max_bin = 256,
                                    predictor = "cpu_predictor",
                                    sample_type = "uniform",
                                    normalize_type = "tree",
                                    rate_drop = 0.0,
                                    one_drop = 0,
                                    skip_drop = 0.0,
                                    feature_selector = "cyclic",
                                    top_k = 0,
                                    tweedie_variance_power = 1.5,
                                    objective = "reg:linear",
                                    base_score = 0.5,
                                    eval_metric = "rmse",
                                    seed = 0,) @ 1…89

julia> mach = machine(model, task)
Machine{XGBoostRegressor} @ 1…99

julia> julia> evaluate!(mach)
┌ Info: Evaluating using cross-validation. 
│ nfolds=6. 
│ shuffle=false 
│ measure=MLJ.rms 
│ operation=StatsBase.predict 
└ Resampling from all rows. 
Cross-validating: 100%[=========================] Time: 0:00:01
6-element Array{Float64,1}:
 15.071084701486205
 16.70750413097405 
 22.12771143813795 
 20.89991496287021 
 15.434870166858115
 11.602463981185641

Have you got MLJModels in your load path? You need MLJModels and MLJ in your project. Perhaps send me the result of ]status -m or your Manifest.toml.

rssdev10 commented 5 years ago

Please see attached package. abc.tar.gz Run ./build.jl from the file.

I'm afraid I cannot reproduce your problem:

It might be concurrency issue and be unstable. I can not say that I see it always. But in most cases it is present.

julia version 1.0.3. MacOS

(Abc) pkg> status -m
Project Abc v0.1.0
    Status `~/projects/tmp/julia/Abc/Manifest.toml`
  [7d9fca2a] Arpack v0.3.1
  [9e28174c] BinDeps v0.8.10
  [b99e7846] BinaryProvider v0.5.4
  [336ed68f] CSV v0.5.5
  [324d7699] CategoricalArrays v0.5.4
  [34da2185] Compat v2.1.0
  [a93c6f00] DataFrames v0.18.3
  [864edb3b] DataStructures v0.15.0
  [b4f34e82] Distances v0.8.0
  [31c24e10] Distributions v0.20.0
  [cd3eb016] HTTP v0.8.2
  [83e8ac13] IniFile v0.5.0
  [82899510] IteratorInterfaceExtensions v1.0.0
  [682c06a0] JSON v0.20.0
  [2d691ee1] LIBLINEAR v0.5.1
  [b1bec4e5] LIBSVM v0.3.1
  [add582a8] MLJ v0.2.3
  [a7f614a8] MLJBase v0.2.2
  [d491faf4] MLJModels v0.2.3
  [739be429] MbedTLS v0.6.8
  [e1d29d7a] Missings v0.4.1
  [bac558e1] OrderedCollections v1.1.0
  [90014a1f] PDMats v0.9.7
  [69de0a69] Parsers v0.3.5
  [2dfb63ee] PooledArrays v0.5.2
  [92933f4c] ProgressMeter v1.0.0
  [1fd47b50] QuadGK v2.0.4
  [3cdcf5f2] RecipesBase v0.6.0
  [189a3867] Reexport v0.2.0
  [cbe49d4c] RemoteFiles v0.2.1
  [ae029012] Requires v0.5.2
  [79098fc4] Rmath v0.5.0
  [6e75b9c4] ScikitLearnBase v0.4.1
  [a2af1166] SortingAlgorithms v0.3.1
  [276daf66] SpecialFunctions v0.7.2
  [2913bbd2] StatsBase v0.30.0
  [4c63d2b9] StatsFuns v0.8.0
  [3783bdb8] TableTraits v1.0.0
  [bd369af6] Tables v0.2.5
  [30578b45] URIParser v0.4.0
  [ea10d353] WeakRefStrings v0.6.1
  [009559a3] XGBoost v0.3.1
  [2a0f44e3] Base64 
  [ade2ca70] Dates 
  [8bb1440f] DelimitedFiles 
  [8ba89e20] Distributed 
  [9fa8497b] Future 
  [b77e0a4c] InteractiveUtils 
  [76f85450] LibGit2 
  [8f399da3] Libdl 
  [37e2e46d] LinearAlgebra 
  [56ddb016] Logging 
  [d6f4376e] Markdown 
  [a63ad114] Mmap 
  [44cfe95a] Pkg 
  [de0858da] Printf 
  [9abbd945] Profile 
  [3fa0cd96] REPL 
  [9a3f8284] Random 
  [ea8e919c] SHA 
  [9e88b42a] Serialization 
  [1a1011a3] SharedArrays 
  [6462fe0b] Sockets 
  [2f01184e] SparseArrays 
  [10745b16] Statistics 
  [4607b0f0] SuiteSparse 
  [8dfed614] Test 
  [cf7118a7] UUIDs 
  [4ec0a83e] Unicode 
ablaom commented 5 years ago

Strange. I still can't reproduce your problem after activating the environment you sent:

(working) pkg> activate .

(Abc) pkg> instantiate
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`

julia> module Abc

       import XGBoost: dump_model, save, Booster

       using MLJ
       using MLJBase
       import MLJModels

       using MLJModels.XGBoost_

       function __init__()
           @info "Abc"
       end

       end

Main.Abc

julia> using MLJ

julia> task = load_boston()
model = SupervisedTask{} @ 1…38

julia> model = Abc.XGBoostRegressor()
MLJModels.XGBoost_.XGBoostRegressor(num_round = 1,
                                    booster = "gbtree",
                                    disable_default_eval_metric = 0,
                                    eta = 0.3,
                                    gamma = 0.0,
                                    max_depth = 6,
                                    min_child_weight = 1.0,
                                    max_delta_step = 0.0,
                                    subsample = 1.0,
                                    colsample_bytree = 1.0,
                                    colsample_bylevel = 1.0,
                                    lambda = 1.0,
                                    alpha = 0.0,
                                    tree_method = "auto",
                                    sketch_eps = 0.03,
                                    scale_pos_weight = 1.0,
                                    updater = "grow_colmaker",
                                    refresh_leaf = 1,
                                    process_type = "default",
                                    grow_policy = "depthwise",
                                    max_leaves = 0,
                                    max_bin = 256,
                                    predictor = "cpu_predictor",
                                    sample_type = "uniform",
                                    normalize_type = "tree",
                                    rate_drop = 0.0,
                                    one_drop = 0,
                                    skip_drop = 0.0,
                                    feature_selector = "cyclic",
                                    top_k = 0,
                                    tweedie_variance_power = 1.5,
                                    objective = "reg:linear",
                                    base_score = 0.5,
                                    eval_metric = "rmse",
                                    seed = 0,) @ 5…98

julia> mach = machine(model, task)
Machine{XGBoostRegressor} @ 1…64

julia> evaluate!(mach)
┌ Info: Evaluating using cross-validation. 
│ nfolds=6. 
│ shuffle=false 
│ measure=MLJ.rms 
│ operation=StatsBase.predict 
└ Resampling from all rows. 
Cross-validating: 100%[=========================] Time: 0:00:02
6-element Array{Float64,1}:
 15.071084701486205
 16.70750413097405 
 22.12771143813795 
 20.89991496287021 
 15.434870166858115
 11.602463981185641

julia> versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
  JULIA_PATH = /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia

Run on MacOS.

rssdev10 commented 5 years ago

Can you try to run it without REPL from command line with ./build.jl only? Again, I think something like concurrency issue is here.

Also I have a little bit older laptop:

julia> versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
ablaom commented 5 years ago

Yes, now I can reproduce your issue. Many thanks for this. I would say we have uncovered a limitation of Requires.jl. Do you not agree?

A secondary question is whether the @load macro will work when called within a package, for models in packages with native MLJ interface implementations (ie, outside of MLJModels). In this case there would be no lazy loading. Unfortunately, no such package actually exists but we will have some soon (or could construct a Dummy package).

edit July 23, 2020: Can confirm that if interface is provided by a package without use of requires, then issue is not there.

rssdev10 commented 5 years ago

Yes, I it might be restriction of Requires.jl. See also double call of __init__ as I mentioned in first message. But again, I almost sure that it is concurrency issue. I found that issue when had prepared the code for running as a web service.

So, some workaround we have. Regarding how to fix, as the issue confirmed, may be just put same issue with my sample to Requires.jl's list of issues if nobody can dive into it now.

Regarding loading of models, for now I'm using Booster(model_file = model_fn) exactly for XGBoost.

ablaom commented 5 years ago

Although I am doubtful, thought it worth mentioning that there was a refactor of @load that possibly resolve this issue. MLJModels 0.4.0 (which now owns the method) incorporates the changes.

ablaom commented 5 years ago

Update: This issue is unresolved under MLJModels 0.5.0.

tlienart commented 4 years ago

@ablaom is this still a (relevant) issue?

ablaom commented 4 years ago

I believe it is still an issue. It seems one can't use MLJ to load models from within a package module. Some clues are provided above and in #321. I suspect (but have not confirmed) that this is a Requires issue. To reproduce be sure to follow the instructions of @rssdev10 exactly.

cscherrer commented 4 years ago

@ablaom @tlienart We're running into this issue as well

ablaom commented 4 years ago

Noted. The long term plan is to "disintegrate" MLJModels into individual packages, eliminating all use of Requires.jl. Then loading a model with glue code currently provided by MLJModels, should be no different from loading models from packages that natively support the MLJ model interface (eg, EvoTrees.jl, MLJLinearModels.jl). In these cases, I am not aware of any issue, but let me know if you discover one.

ablaom commented 4 years ago

Partial workaround is here: https://github.com/alan-turing-institute/MLJ.jl/issues/613#issuecomment-662784184

OkonSamuel commented 4 years ago

I think we better start the disintegration of MLJModels fast

ablaom commented 4 years ago

PR's welcome 😄 Happy to provide guidance. The repos are called `MLJGLMInterface.jl, and so forth). If you want to start on one, let me know which and I'll get you commit access.

Here's the issue: https://github.com/alan-turing-institute/MLJModels.jl/issues/244#issuecomment-641668554

OkonSamuel commented 4 years ago

Great. I will work on them in my spare time.

ablaom commented 3 years ago

Pretty sure this has been resolved by above PR.