JuliaML / META

Discussions related to the future of Machine Learning in Julia
MIT License
10 stars 2 forks source link

List of existing packages and projects #9

Open ahwillia opened 8 years ago

ahwillia commented 8 years ago

I want to keep a running list of either machine learning packages in Julia that are being actively developed/maintained. My reasoning is that (a) we have a lot to learn from each others efforts, and (b) it would be great to coordinate efforts going forward.

Part (b) is of course a bit tricky - having packages that are "standalone" is desirable in many ways. But our vision is that it is also desirable to have these packages (or at least a subset of them) play nicely with each other. The hope is that Learn.jl will provide a consistent API with backends to different optimization and machine learning packages. We'd like the scope to be as broad as possible, while still being simple enough to be useful and maintainable.

Note to collaborators: please update and edit this list as you see fit. Also consider cc-ing the authors of those packages so that they are aware.

Note to others: we'd love to hear your thoughts and updates on your latest projects. We're happy to add your package to this list, please get in touch with us! https://gitter.im/JuliaML/chat

Neural Nets / Deep Learning

These should help guide the development of Transformations.jl

Note: There are also a lot of excellent statistical modeling packages (e.g. MultivariateStats.jl, and others). I view these packages as being a slightly different focus from this project -- namely they provide a library of canned methods with a consistent API. I tried to pick packages that aim to create an internal _framework_ for specifying and fitting a large class of models.

cstjean commented 8 years ago

Are you aware of svaksha's list? It might be a good starting point.

ahwillia commented 8 years ago

Thanks @cstjean -- I found some good stuff there. I'm trying to keep this list shorter / more focused than svaksha's. What do you think of SciKitLearn being added? Are its aims closer to MultivariateStats (see the note)?

cstjean commented 8 years ago

Well, it's modelled on the Python scikit-learn, so "a library of canned methods with a consistent API" describes it pretty well. It's up to you if you want to put it on the list. I'm hopeful that ScikitLearn can be an alternative high-level interface to the Learn.jl models.

tbreloff commented 8 years ago

I'm hopeful that ScikitLearn can be an alternative high-level interface to the Learn.jl models.

@cstjean funny... I'm hoping that Learn.jl can be an alternative high level interface to the ScikitLearn canned methods. 😮 Can someone write a tree-based graph layout recipe for Plots so I can figure out these dependencies??

ahwillia commented 8 years ago

I think the idea is that most statisticians and data scientists will use SciKitLearn, while Learn.jl will be used by researchers developing new algorithms or tailoring/tuning a model very carefully to a particular application. So it isn't so much about high-level vs. low-level, but ability to customize and tinker. I think Learn.jl will take some getting used to for newcomers, while SciKitLearn makes it really easy to apply a whole host of well-known and vetted methods.

Bottom line is -- I view the packages as complimentary, not in competition.

Sisyphuss commented 8 years ago

I prefer (not only me) the term "graphical model" to "Bayes Net", because the Bayes net is not necessarily Bayesian. Basically, it's just a graph language for the generative model.

cstjean commented 8 years ago

I'm hoping that Learn.jl can be an alternative high level interface to the ScikitLearn canned methods.

Why not both! We could have ScikitLearn.jl import LearnBase, and Learn.jl import ScikitLearnBase, so that both frameworks' models are usable in the other. But we don't have to commit to anything at this point; we'll see how Learn.jl develops and do it if it makes sense.

Evizero commented 8 years ago

My core goal is that any researcher / student should be able to switch out any low level part while still being able to use the rest of the highlevel-framework. For example if I were interested in trying out a new Loss, I should be able to write one and still be able to use the rest (such as optimization algorithm and CV) as intended and without performance penalties (so it should be on the same level as "build-in" loss functions and not some second class construct). Same goes if I am interested in testing a new way of splitting or resampling a dataset, or a new data source, or writing an optimization algorithm that should work for let's say any margin-based loss function that is convex with a strongly convex and differentiable penalty.

In other words, a researcher should be able to focus just on the part that is of his/her research interest, while not having to reimplement all the other stuff to be able to compare it to established approaches. I think that is something Julia is uniquely qualified for

ChrisRackauckas commented 8 years ago

@Evizero Yes, and the design approach Tom has in mind (and shows in StochasticOptimization) achieves this.

I think the other goal is to have good enough defaults / free cross-validation so that way someone who is researching other topics can easily use a researcher's new tools that they found in a paper, without having to modify anything. That would mean that a researcher would just have to put up a repository with the types for defining the sub-learners (or whatever you want to call them, that's Tom lingo). Should there be a different aggregation package for just holding lots of algorithms, like StatsPlots does for stats plots recipes?

tbreloff commented 8 years ago

Should there be a different aggregation package for just holding lots of algorithms

Yes, probably. We can hold off on that sort of micro-organization until we have lots of implementations ready to organize.

ChrisRackauckas commented 8 years ago

What's the relation of JuliaOpt to JuliaML, or more specifically Optim.jl to things like Learn.jl? Is one early plan to be able to plug into Optim? How do they differ in scope?

tbreloff commented 8 years ago

Undefined at this point. We know about each other and we'll try to share/collaborate when it makes sense.

ChrisRackauckas commented 8 years ago

Okay. The reason why I ask is I plan on, sometime in the not too far future, making some parameter inference for differential equations packages (discussed in JuliaDiffEq/DifferentialEquations.jl#47) and other things which require machine learning / optimization as a component of the algorithms (it'll probably live in JuliaDiffEq because of its focus). I want to know what to "plug into": Learn.jl, Optim.jl, or something else, or whether I'll need to plug into a few JuliaML packages.