cstjean / ScikitLearn.jl

Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/
Other
547 stars 75 forks source link

allow DataFrames.jl 1.0 #96

Closed bkamins closed 3 years ago

bkamins commented 3 years ago

it would be great to have DataFrames.jl 1.0 allowed and then release made. Thank you!

cstjean commented 3 years ago

I'll be happy to merge a PR when the tests pass, but unfortunately, ScikitLearn hasn't been updated in a while, and I'm in no position to really work on it. Help wanted!

bkamins commented 3 years ago

CI throws a massive number of warnings and an error on Python side. I do not know ScikitLearn well enough to diagnose unfortunately.

cstjean commented 3 years ago

Thank you for the work!

CI throws a massive number of warnings and an error on Python side.

The trouble is that scikit-learn on the Python side keeps evolving... https://discourse.julialang.org/t/conda-jl-specify-library-version/59915

azev77 commented 3 years ago

@cstjean, when @timholy moved on from Interpolations.jl he posted "Interpolations.jl needs a new maintainer" on Discourse. Perhaps it would be worth doing the same for this package?

@ablaom heads up, this will likely cause some issues for MLJ users

ablaom commented 3 years ago

cc @OkonSamuel

ablaom commented 3 years ago

@cstjean Curious as to why DataFrames is a dependency. Most models just work on matrices, right?

cstjean commented 3 years ago

Curious as to why DataFrames is a dependency. Most models just work on matrices, right?

Yeah, DataFrames could be made an optional @require dependency, especially now that it's on 1.0.

when timholy moved on from Interpolations.jl he posted "Interpolations.jl needs a new maintainer" on Discourse. Perhaps it would be worth doing the same for this package?

Has there been other successful transitions through such an announcement? Seems a bit risky, especially for a package which is probably most attractive for recent Python to Julia converts...

I'd love to pass on the torch, though. Right now, we're three maintainers: myself, @OkonSamuel and @alexmorley. I'll gladly give commit rights to anyone who shows up with a good PR and an interest.

azev77 commented 3 years ago

Perhaps the folks at PyCall.jl have some suggestions? @stevengj

ablaom commented 3 years ago

Yeah, DataFrames could be made an optional @require dependency, especially now that it's on 1.0.

Based on our experience, I don't recommend using Requires.jl, if you can help it. Perhaps if you need to materialise a table, you just use a julia native table (eg, named tuple of vectors or dictionary of vectors keyed on Symbol). If a user wants a DataFrame, they can just do using DataFrames; DataFrame(table) where table is any of these (or any Tables.jl-compatible table). Just a thought.

On the other hand, if you are actually using DataFrames-specific interface (in particular, random access for rows), and a transition is painful, then leaving the DataFrames dep makes sense for now.

PyDataBlog commented 3 years ago

Any updates on this?

cstjean commented 3 years ago

@PyDataBlog Updates will be posted here, anyone is very welcome to make another PR to complete this. In the meanwhile, you can use this branch to run ScikitLearn + DataFrames 1.0. It certainly won't be any worse than the current release.

cstjean commented 3 years ago

Albert Zevelev pointed out that for his use case, using older versions of Conda/PyCall

Pkg.add([
    Pkg.PackageSpec(name="Conda", version="1.5.0"),
    Pkg.PackageSpec(name="PyCall", version="1.92.2"),
    Pkg.PackageSpec(name="MLJ", version="0.16.0"),
    Pkg.PackageSpec(name="MLJScikitLearnInterface", version="0.1.8"), 
])

resolves his problem. That's another track to follow through on...

ablaom commented 3 years ago

These versions of Conda and Pycall are just one patch earlier than the latest releases. And there is no change I could see in the Project files. So it is curious that this should make a difference.

codecov-commenter commented 3 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@9f15da6). Click here to learn what that means. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #96   +/-   ##
=========================================
  Coverage          ?   67.62%           
=========================================
  Files             ?       13           
  Lines             ?      698           
  Branches          ?        0           
=========================================
  Hits              ?      472           
  Misses            ?      226           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9f15da6...f029de2. Read the comment docs.

giordano commented 3 years ago

Tests on Ubuntu are passing, but notebook examples fail on macOS and Windows at the moment, which is the same situation we currently have on master. Good to merge?

bkamins commented 3 years ago

@giordano - thank you for finishing this.

bkamins commented 3 years ago

Should the package get a new release?

cstjean commented 3 years ago

I wish, but AFAIK we can't release until all the tests pass, no? We're close, but macos+windows still does not work.

bkamins commented 3 years ago

Ah - OK (I pinged, because there were some users asking about the release)

giordano commented 3 years ago

We're close, but macos+windows still does not work.

Tests do pass for me locally on macOS in a clean environment. I suspect there are some issues with Python/Conda setup for macOS/Windows in GitHub Actions, but hunting them down could be a long task. Is that worth holding this back?

bkamins commented 3 years ago

In this case I would make a release and wait for user's feedback if anything is broken and then try to fix it in patch release (if needed)

cstjean commented 3 years ago

In this case I would make a release and wait for user's feedback if anything is broken and then try to fix it in patch release (if needed)

I'm out of touch, but once upon a time the registry auto-rejected packages whose CI failed. Has that changed?

ablaom commented 3 years ago

I don't think you even need CI implemented to register a package. And I agree with @bkamins suggestion.

cc @DilumAluthge

DilumAluthge commented 3 years ago

Yeah for AutoMerge, we only check that import PackageName works.

cstjean commented 3 years ago

Then let's release!

ablaom commented 3 years ago

@cstjean Can you release a new version please, after merging #100, or delegate (maybe @OkonSamuel has authority now also?)?

cstjean commented 3 years ago

@OkonSamuel would you have time?