Closed ablaom closed 1 year ago
Okay, I've run into a problem here.
First note that the way the registry currently works, all model-providing packages must be imported simultaneously. In hindsight this sounds like a dumb idea but it's actually not caused as many problems.
However, ScikitLearn.jl
and CatBoost.jl
are not playing nicely:
# in fresh environment:
(jl_AmJisH) pkg> add ScikitLearn CatBoost
julia> using CatBoost
julia> using ScikitLearn
Error processing line 1 of /Users/anthony/anaconda2/envs/py37/lib/python3.7/site-packages/matplotlib-3.4.3-py3.7-nspkg.pth:
Fatal Python error: initsite: Failed to import the site module
Traceback (most recent call last):
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/site.py", line 168, in addpackage
exec(line)
File "<string>", line 1, in <module>
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/importlib/util.py", line 14, in <module>
from contextlib import contextmanager
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/contextlib.py", line 5, in <module>
from collections import deque
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/collections/__init__.py", line 24, in <module>
import heapq as _heapq
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/heapq.py", line 587, in <module>
from _heapq import *
SystemError: initialization of _heapq did not return an extension module
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/site.py", line 579, in <module>
main()
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/site.py", line 566, in main
known_paths = addsitepackages(known_paths)
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/site.py", line 349, in addsitepackages
addsitedir(sitedir, known_paths)
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/site.py", line 207, in addsitedir
addpackage(sitedir, name, known_paths)
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/site.py", line 178, in addpackage
import traceback
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/traceback.py", line 3, in <module>
import collections
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/collections/__init__.py", line 24, in <module>
import heapq as _heapq
File "/Users/anthony/anaconda2/envs/py37/lib/python3.7/heapq.py", line 587, in <module>
from _heapq import *
SystemError: initialization of _heapq did not return an extension module
and Julia exits.
@ericphanson @tylerjthomas9 Any insights here?
No 🤔. I don't see any issues in upstream catboost either: https://github.com/catboost/catboost/issues?q=is%3Aissue+scikitlearn+
But I do find this one: https://github.com/JuliaPy/pyjulia/issues/150
https://github.com/cjdoris/PythonCall.jl/issues/220
I think that PyCall.jl in ScikitLearn.jl and PythonCall.jl in CatBoost.jl are calling different python versions. Here is a method (not very pretty) of fixing this issue between the two libraries:
pkg> add ScikitLearn CatBoost PythonCall
julia> using PythonCall, Pkg
julia> ENV["PYTHON"] = PythonCall.C.CTX.exe_path
julia> Pkg.build("PyCall")
julia> using CatBoost
julia> using ScikitLearn
@tylerjthomas9 This is a great help and explains the problem. Unfortunately, after playing around for a few hours, I cannot get things to work locally in the context of the model registry process. And this also needs to work in CI, which checks the registry. There may be a way, but I can see this is going to be a high-maintenance hack.
The bigger picture is that MLJ users do want to load multiple models simultaneously for model comparison, but it doesn't seem this can work at present for PyCall / PythonCall models without introducing package management headaches beyond the average user.
For now, I will add CatBoost.jl to the list of Third Party Packages to give it some visibility. And we can add it to the List of Models with an asterix tagging as unregistered. Happy to hear your thoughts on this.
For the record, OutlierDetectionPython.jl also uses PyCall
, and its models are in the MLJ Model Registry. @davnn Do you have any inclination to move towards PythonCall? In discussions I have been having, it seems there is some consensus, and even some time commitment, to move ScikitLearn in this direction.
For the record, OutlierDetectionPython.jl also uses PyCall, and its models are in the MLJ Model Registry. @davnn Do you have any inclination to move towards PythonCall? In discussions I have been having, it seems there is some consensus, and even some time commitment, to move ScikitLearn in this direction.
I would be happy to swap to PythonCall, but last time I checked I ran into some troubles and kept PyCall for now. I'll reevaluate in the near future.
@azev77 @ericphanson