JuliaAI / CatBoost.jl

Julia wrapper of the python library CatBoost for boosted decision trees
MIT License
11 stars 3 forks source link

Update catboost version to `>=1.1` #27

Closed tylerjthomas9 closed 1 year ago

tylerjthomas9 commented 1 year ago

Catboost v1.2 was recently released, and it comes with Python 3.11 support. Here are the full release notes: https://github.com/catboost/catboost/releases/tag/v1.2

ericphanson commented 1 year ago

I wonder what the right way to handle the versioning story for python dependencies is. I think one option is to do this, and allow multiple library versions with one wrapper package version. In that case, to fix the catboost version, I think the user needs to specify the version in a CondaPkg.toml in their code. Do you know if that works properly? E.g. does PythonCall merge the version restrictions? I think another option is to have one catboost version per CatBoost.jl version, and use Pkg to handle the versioning, like how JLL packages work.

What do you think? Is there a standard convention here for python wrappers?

tylerjthomas9 commented 1 year ago

I have wondered the same thing. I noticed that PythonPlot.jl sets matplotlib = ">=1".

I think that the most robust solution would be to have a secondary package with just the Python install, then we could use Pkg.jl to set compatible versions. However, this would involve registering another package.

I think one option is to do this, and allow multiple library versions with one wrapper package version. In that case, to fix the catboost version, I think the user needs to specify the version in a CondaPkg.toml in their code. Do you know if that works properly? E.g. does PythonCall merge the version restrictions?

If a user restricts the version in a CondaPkg.toml in their code, it will be passed to the mamba solver as an additional restriction to the package install. PythonCall.jl itself has a minimum Python version (https://github.com/cjdoris/PythonCall.jl/blob/main/CondaPkg.toml), and it just adds another restriction to the Python version if a user adds it to their CondaPkg.toml. Below, I have added an example where I restricted Python to 3.11 in an environment. This just comes through as a second restriction on the Python version that is being installed.

julia> using CondaPkg

julia> CondaPkg.add("python"; version="=3.11", resolve=false)

julia> using PythonCall
    CondaPkg Found dependencies: C:\Users\TYLERT~1\AppData\Local\Temp\jl_X0RCNf\CondaPkg.toml
    CondaPkg Found dependencies: C:\Users\TylerThomas\.julia\packages\PythonCall\dsECZ\CondaPkg.toml
    CondaPkg Resolving changes
             + python
    CondaPkg Creating environment
             │ C:\Users\TylerThomas\.julia\artifacts\444cdd4c5cbacc31e39255b42de7eb3458f5ebd1\bin\micromamba.exe
             │ -r C:\Users\TylerThomas\.julia\scratchspaces\0b3b1443-0f03-428d-bdfb-f27f9c1191ea\root
             │ create
             │ -y
             │ -p C:\Users\TYLERT~1\AppData\Local\Temp\jl_X0RCNf\.CondaPkg\env
             │ --override-channels
             │ --no-channel-priority
             │ python[version='=3.11']
             │ python[version='>=3.7,<4',channel='conda-forge',build='*cpython*']
             └ -c conda-forge

In summary, I see 3 approaches:

I like the idea of giving the user flexibility with the Python version (similar to PythonPlot.jl), but I do not think any of these approaches are bad. Let me know what you think.

ericphanson commented 1 year ago

Thanks for the detailed explanation!

I think it makes sense to give the user some flexibility, and not need a package bump here every time. Talking with @palday a bit, I think if catboost is fairly stable then this is a good tradeoff, but if it proves very unstable then maybe it's better to restrict it more heavily.

I think we should also document in the readme that users who want to fix the version of CatBoost must add a CondaPkg.toml with that restriction, and that fixing the version of CatBoost.jl isn't enough. I think julia users are used to a Project.toml+Manifest.toml being enough for reproducability, but that isn't enough with python dependencies- and that's true even if we fixed the version here, since one needs to pin all the transitive python dependencies too (and probably needs to move to docker at that point). So a pointer to let them know is probably helpful.

tylerjthomas9 commented 1 year ago

I bumped CatBoost.jl to v0.3.2, and added a short section in the README to help users that want to specify specific catboost versions.