cstjean / ScikitLearn.jl

Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/
Other
546 stars 75 forks source link

Update example for 1.0 #51

Closed PallHaraldsson closed 5 years ago

PallHaraldsson commented 5 years ago

I've not yet checked if this is enough, am doing now, still precompiling... If this isn't enough, then at least this PR is a reminder.

Possibly just skip this part "(or Pkg.add(\"PyPlot\") for older Julia" as the package has already been updated for 1.0/0.7 only.

PallHaraldsson commented 5 years ago

I'm letting you know, this is at least not enough on my machine; all imports failed after I made sure all using worked.

There seem no longer to be any showstoppers (while note the "deprecated"), when I first do (this is probably somewhere in the docs that I didn't read and looking back I didn't find but I guess overlooked):

shell> pip install sklearn
[..]

julia> @sk_import cluster: KMeans
┌ Warning: `getindex(o::PyObject, s::Symbol)` is deprecated in favor of dot overloading (`getproperty`) so elements should now be accessed as e.g. `o.s` instead of `o[:s]`.
│   caller = import_sklearn() at Skcore.jl:120
└ @ ScikitLearn.Skcore ~/.julia/packages/ScikitLearn/HK6Vs/src/Skcore.jl:120
┌ Warning: `getindex(o::PyObject, s::Symbol)` is deprecated in favor of dot overloading (`getproperty`) so elements should now be accessed as e.g. `o.s` instead of `o[:s]`.
│   caller = top-level scope at Skcore.jl:158
└ @ Core ~/.julia/packages/ScikitLearn/HK6Vs/src/Skcore.jl:158
PyObject <class 'sklearn.cluster.k_means_.KMeans'>

So rest is outdated (I haven't yet tried further than the imports so I'm not saying this all works for sure in Julia 1.0 (or in my case 1.1.)):

One question, since @pyimport still works elsewhere (I tried from old PyCall's README), just is no longer preferred way to do it, should @sk_import also be changed to be a non-macro (I believe it is defined in ScikitLearn; and its potential replacement should be then there too)?

Installing pip and then: pip install cluster didn't help (with first changing to Python2), so I'm no kind of stuck and will see if I get a response in this thread:

ENV["PYTHON"] = "/usr/bin/python" didn't help. Possibly since PyCall now defaults to Python 3, that could be a problem (it still supports Python 2; you would just have to ask for it if it's really needed here).

Also I'm on Julia 1.2 is if could matter, I doubt it.

julia> @sk_import cluster: KMeans
ERROR: PyError (PyImport_ImportModule

The Python package sklearn could not be found by pyimport. Usually this means
that you did not install sklearn in the Python version being used by PyCall.

PyCall is currently configured to use the Python version at:

/usr/bin/python3

and you should use whatever mechanism you usually use (apt-get, pip, conda,
etcetera) to install the Python package containing the sklearn module.

One alternative is to re-configure PyCall to use a different Python
version on your system: set ENV["PYTHON"] to the path/name of the python
executable you want to use, run Pkg.build("PyCall"), and re-launch Julia.

Another alternative is to configure PyCall to use a Julia-specific Python
distribution via the Conda.jl package (which installs a private Anaconda
Python distribution), which has the advantage that packages can be installed
and kept up-to-date via Julia.  As explained in the PyCall documentation,
set ENV["PYTHON"]="", run Pkg.build("PyCall"), and re-launch Julia. Then,
To install the sklearn module, you can use `pyimport_conda("sklearn", PKG)`,
where PKG is the Anaconda package the contains the module sklearn,
or alternatively you can use the Conda package directly (via
`using Conda` followed by `Conda.add` etcetera).

) <class 'ModuleNotFoundError'>
ModuleNotFoundError("No module named 'sklearn'",)

Stacktrace:
 [1] pyimport(::String) at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:531
 [2] pyimport_conda(::String, ::String, ::String) at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:689
 [3] pyimport_conda at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:688 [inlined]
 [4] import_sklearn() at /home/qwerty/.julia/packages/ScikitLearn/HK6Vs/src/Skcore.jl:119
 [5] top-level scope at /home/qwerty/.julia/packages/ScikitLearn/HK6Vs/src/Skcore.jl:153

[..]

julia> @pyimport sklearn.metrics as metrics
ERROR: PyError (PyImport_ImportModule

The Python package sklearn.metrics could not be found by pyimport. Usually this means
that you did not install sklearn.metrics in the Python version being used by PyCall.

PyCall is currently configured to use the Python version at:

/usr/bin/python3

and you should use whatever mechanism you usually use (apt-get, pip, conda,
etcetera) to install the Python package containing the sklearn.metrics module.

One alternative is to re-configure PyCall to use a different Python
version on your system: set ENV["PYTHON"] to the path/name of the python
executable you want to use, run Pkg.build("PyCall"), and re-launch Julia.

Another alternative is to configure PyCall to use a Julia-specific Python
distribution via the Conda.jl package (which installs a private Anaconda
Python distribution), which has the advantage that packages can be installed
and kept up-to-date via Julia.  As explained in the PyCall documentation,
set ENV["PYTHON"]="", run Pkg.build("PyCall"), and re-launch Julia. Then,
To install the sklearn.metrics module, you can use `pyimport_conda("sklearn.metrics", PKG)`,
where PKG is the Anaconda package the contains the module sklearn.metrics,
or alternatively you can use the Conda package directly (via
`using Conda` followed by `Conda.add` etcetera).

) <class 'ModuleNotFoundError'>
ModuleNotFoundError("No module named 'sklearn'",)

Stacktrace:
 [1] pyimport(::String) at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:531
 [2] top-level scope at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:575

[..]
PallHaraldsson commented 5 years ago

Since some checks actually did NOT fail, and I only changed a comment so far in the PR, the other CI checks are I guess invalid. You may want to look into those, e.g. mac CI checks, they likely are failing for all not just me?

cstjean commented 5 years ago

Yes, don't worry about the tests, this package's tests have been wobbly for a long time.

One question, since @pyimport still works elsewhere (I tried from old PyCall's README), just is no longer preferred way to do it, should @sk_import also be changed to be a non-macro (I' belive it would be defined in ScikitLearn)?

I'm not sure. @sk_import makes it look more like Python IMO, but maybe it's not worth maintaining. You've seen this PR I assume? Certainly, anything to simplify this package would be nice, especially now that I don't have so much time. But then again, one would have to update all the examples.

Thank you for fixing the comment.

PallHaraldsson commented 5 years ago

@cstjean FYI: I get some errors, that seems related to new PyCall version, not Julia 1.0 (or 1.2):

So should I add more commits here, or rather in new PR? I may or may not do it, just asking where then preferred? It's not completely clear to me if you should adjust REQUIRE (and/or fix the code) to older PyCall; This may also be a bug in PyCall?

Unless I missed something, this is the first red/error I find (and then some more for e.g. next line with "ERROR: ArgumentError: hasproperty of NULL PyObject"):

julia> imshow(Z, interpolation="nearest",
               extent=(minimum(xx), maximum(xx), minimum(yy), maximum(yy)),
              cmap=PyPlot.cm[:Paired],
              aspect="auto", origin="lower")
┌ Warning: `getindex(o::PyObject, s::Symbol)` is deprecated in favor of dot overloading (`getproperty`) so elements should now be accessed as e.g. `o.s` instead of `o[:s]`.
│   caller = top-level scope at none:0
└ @ Core none:0
ERROR: ArgumentError: ref of NULL PyObject
Stacktrace:
 [1] getproperty(::PyObject, ::String) at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:293
 [2] getproperty at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:303 [inlined]
 [3] getindex(::PyObject, ::Symbol) at /home/qwerty/.julia/packages/PyCall/zTRMa/src/PyCall.jl:330
 [4] top-level scope at none:0
cstjean commented 5 years ago

New PR?

Ideally, this should be reported in PyCall too. If it's deprecated, it should still work.

stevengj commented 5 years ago

Do you get the error about NULL PyObject when you use the released version of PyCall, not master?

PallHaraldsson commented 5 years ago

For those reading here, I can confirm PyCall (with e.g. defaults, Conda) works; and that I no longer get the errors I got, while on master I do not get any plot (nor errors, only warnings; see the bug report).

For this "WARNING: Base.writemime is deprecated." however, I don't get it so it could be dropped from the notebook.

PallHaraldsson commented 5 years ago

Running for the thrid time I don't get the plot, but noticed the haskey Warning (should explain it);

julia> imshow(Z, interpolation="nearest",
               extent=(minimum(xx), maximum(xx), minimum(yy), maximum(yy)),
              cmap=PyPlot.cm[:Paired],
              aspect="auto", origin="lower")
┌ Warning: `getindex(o::PyObject, s::Symbol)` is deprecated in favor of dot overloading (`getproperty`) so elements should now be accessed as e.g. `o.s` instead of `o[:s]`.
│   caller = top-level scope at none:0
└ @ Core none:0
┌ Warning: `haskey(o::PyObject, s::Union{Symbol, AbstractString})` is deprecated, use `hasproperty(o, s)` instead.
│   caller = #imshow#67(::Base.Iterators.Pairs{Symbol,Any,NTuple{5,Symbol},NamedTuple{(:interpolation, :extent, :cmap, :aspect, :origin),Tuple{String,NTuple{4,Float64},ColorMap,String,String}}}, ::Function, ::Array{Int32,2}) at PyPlot.jl:176
└ @ PyPlot ~/.julia/packages/PyPlot/mQXSC/src/PyPlot.jl:176
┌ Warning: `getindex(o::PyObject, s::AbstractString)` is deprecated in favor of dot overloading (`getproperty`) so elements should now be accessed as e.g. `o."s"` instead of `o["s"]`.
│   caller = #imshow#67(::Base.Iterators.Pairs{Symbol,Any,NTuple{5,Symbol},NamedTuple{(:interpolation, :extent, :cmap, :aspect, :origin),Tuple{String,NTuple{4,Float64},ColorMap,String,String}}}, ::Function, ::Array{Int32,2}) at PyPlot.jl:179
└ @ PyPlot ~/.julia/packages/PyPlot/mQXSC/src/PyPlot.jl:179
PyObject <matplotlib.image.AxesImage object at 0x7f373735b7b8>

I also noticed this "FutureWarning", that I'm not sure what is, be seems neither would explain:

julia> bench_k_means(KMeans(init="random", n_clusters=n_digits, n_init=10),
                     "random", data)
/home/qwerty/.julia/conda/3/lib/python3.7/site-packages/sklearn/metrics/cluster/supervised.py:732: FutureWarning: The behavior of AMI will change in version 0.22. To match the behavior of 'v_measure_score', AMI will use average_method='arithmetic' by default.
  FutureWarning)
   random   0.50s    69656   0.670   0.712   0.690   0.550   0.666    0.157