JuliaML / META

Discussions related to the future of Machine Learning in Julia
MIT License
10 stars 2 forks source link

Request from a layperson #18

Open sbromberger opened 6 years ago

sbromberger commented 6 years ago

Hi ML gurus,

Forgive me if this is too forward, but I'd like to share an observation and make a request.

One thing that I think Julia needs is a robust, easy-to-use machine learning toolkit focused on end users who are doing data analysis but aren't necessarily ML experts. I'm one of these people. For example, I need to run classification and logistic regression on a set of data to see whether a hypothesis I have is valid. I've taken some ML and stats courses but am nowhere near an expert in these fields, and frankly, the existing Julia packages seem to be geared towards ML researchers, not users. I can't really find what I need in the multiple packages within the JuliaML organization.

As a result I'm having to use sklearn to do my data analysis, which makes me sad as a Julia evangelist. Sklearn has everything I need in a toolkit, though: it's easy to understand and to use, offers some customization (but not so much that I'm tearing my hair out trying to understand the options), and gives me reasonably-fast results that my colleagues seem to understand.

My request is this: could there be some effort made to come up with something like sklearn in native Julia? I make this request knowing full well that I can't contribute much other than ideas and feedback, but it sure would make things easier for those of us who just need to do some data analysis without knowing about or using the most cutting-edge ML algorithms, and want to do it in Julia.

datnamer commented 6 years ago

@sbromberger how about this: https://github.com/cstjean/ScikitLearn.jl

sbromberger commented 6 years ago

@datnamer yes, I've been toying with the idea of using it, but 1) it's a wrapper, so I still need the python infrastructure, and 2) it doesn't seem to be actively developed anymore.

datnamer commented 6 years ago

@sbromberger Yea. Do you use sklearn in python or through pycall? With the new dot overloading the latter might be getting easier.

sbromberger commented 6 years ago

@datnamer - I'm using it in python (which is the part that makes me sad). It's hard for me to justify using a Julia package that wraps software that my colleagues use natively. A pure-Julia implementation of the tools would be easier to rationalize.

datnamer commented 6 years ago

I hear that.

I think a ML toolkit is what this is supposed to be starting on: https://github.com/JuliaML/Learn.jl

Lots of the pieces are already here, and hopefully a big dev boost around 1.0 will make it happen.

But I'll let the experts weight in.

sbromberger commented 6 years ago

Lots of the pieces are already here,

Yes - I don't mean to disparage the existing corpus of work or imply that the field is barren. I'm just not quite sure where to go to find the functionality I need, and it's a bit of a hindrance right now that there's not a single one-stop shop for common ML-like tasks.

denizyuret commented 6 years ago

Just look at Knet.jl and its readme / documentation. On Wed, Jan 3, 2018 at 06:54 Seth Bromberger notifications@github.com wrote:

Lots of the pieces are already here,

Yes - I don't mean to disparage the existing corpus of work or imply that the field is barren. I'm just not quite sure where to go to find the functionality I need, and it's a bit of a hindrance right now that there's not a single one-stop shop for common ML-like tasks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JuliaML/Roadmap.jl/issues/18#issuecomment-354933120, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNpmMUzSx6ib7PEkAfi-aR5M2zEHwuks5tGvnjgaJpZM4RRUZZ .

sbromberger commented 6 years ago

@denizyuret - I did look at Knet and its documentation - it's very impressive, but I couldn't find simple examples of the algorithms I use (DBSCAN and k-means for clustering, for example, along with some random forests for classification). I gather it's more focused on neural networks, and that's a bit too heavy for what I'm trying to do.

denizyuret commented 6 years ago

You are right, Knet is more focused on gradient based methods right now. Logistic regression from your original list would work.

On Wed, Jan 3, 2018, 08:42 Seth Bromberger notifications@github.com wrote:

@denizyuret https://github.com/denizyuret - I did look at Knet and its documentation - it's very impressive, but I couldn't find simple examples of the algorithms I use (DBSCAN and k-means for clustering, for example, along with some random forests for classification). I gather it's more focused on neural networks, and that's a bit too heavy for what I'm trying to do.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/JuliaML/Roadmap.jl/issues/18#issuecomment-354940895, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNpsbflT0clX5J2r0K-lVMXaVchiceks5tGxNCgaJpZM4RRUZZ .

ChrisRackauckas commented 6 years ago

A one-stop shop is very important IMO. With it, there's a certain branding and trust that can be built. It's hard to trust a lot of these little learning pieces, especially since many of them are built by authors I don't know with small test sets and are rarely updated. But a metapackage's branding gives it the trust of a common governance that will test, maintain, and fix issues unlike you'd expect with a random repository. It gives you something to point to as "the package in Julia which does everything ML you need", which right now it's hard to describe what that would be. IMO it's important for the the growth of the ecosystem that something like SciKitLearn comes up, even if it's mostly an API wrap and fancy docs over other packages.

Evizero commented 6 years ago

I think the simple truth is that this end-user desire is long known. It's just that no one so far was interested or able to write it.

smldis commented 6 years ago

Consider a Plots.jl like package with all sorts of machine learning backends wrapping not only python ecosystem but also R and julia (and dynamically chosing backend's algorithms), would it solve the problem to justify using a Julia package that wraps software in your environment?

oxinabox commented 6 years ago

I've been linking this post about the internet a lot lately: http://white.ucc.asn.au/2017/12/18/7-Binary-Classifier-Libraries-in-Julia.html so I suspect you might have seen it @sbromberger. It shows 7 of the binary classification libraries in julia right now. and it fits them to a common API: fit!, predict, with observations in last index. This would be of interest to you in two ways: first it is actually what you are interested in doing, and shows the packages for it. More importantly though it does show how they are all a bit different in different ways (and that is bad).

I don't think we need a single package so monolithic as ScikitLearn. I'ld rather have clustering live in Clustering.jl, and Dimentionality Reduction living in MultivariateStats.jl. etc etc and these just so happen to work as if they were made together with other packages/each other. And they will do so, as long as people stick to sane julian conventions (Observations in last dimension). And at that point making a metapackage is easy, and also maybe not required (but maybe it is).

sbromberger commented 6 years ago

Thanks, @oxinabox - that's very helpful. I guess I'm not advocating a single package, but rather a single place that will tell me what packages I need.

One thing I've found supremely helpful is the graphic that sklearn has (http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html) - this allows me to select a reasonable tool for the job, and I know that the functionality exists in the package. Were JuliaML to do something similar, I'd appreciate knowing what package to use as well.