amidst / toolbox

A Java Toolbox for Scalable Probabilistic Machine Learning
http://www.amidsttoolbox.com
Apache License 2.0
119 stars 35 forks source link

Integration with ML Flink #61

Open rcabanasdepaz opened 7 years ago

rcabanasdepaz commented 7 years ago

General info: https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap

Quick start: https://ci.apache.org/projects/flink/flink-docs-release-0.9/libs/ml/quickstart.html

Speech about flinkML: http://es.slideshare.net/TheodorosVasiloudis/flinkml-large-scale-machine-learning-with-apache-flink

thvasilo commented 7 years ago

Hello Rafael, just ran into this issue :)

Could you give a few more details about your plans?

We are going to start development on a online learning library for Flink soon, so we are looking at our options for what to include in the library and we could also be looking to bring in some of the work that has been done as part of the AMIDST project.

rcabanasdepaz commented 7 years ago

Hello Theodore, this issue is in a very initial phase of development. Our idea is to make possible to use any of the latent variable models provided by AMIDST with FlinkML data structures (e.g., DataSet[LabeledVector]). This functionality will be used from scala. To best of our knowledge, FlinkML cannot be used yet from Java, or at least the whole functionality.

Yet, our toolbox is already integrated with (standard) Flink by means of the module flinklink. With that, you are able to learn and to do inference of PGMs in a cluster environment. More details are given in the documentation of the web:

http://www.amidsttoolbox.com/documentation/0-6-0/examples-060/flinklink-060/

All the about this issue will be publish here. Alternatively, you can also be aware of the news about the toolbox by twitter: https://twitter.com/AmidstToolbox

thvasilo commented 7 years ago

Cool, let me know if you need any help. If you think some of your work would make sense to be ported to FlinkML, we can talk about that as well. We still don't have a Naive Bayes model for example which I see is included here.

You are right that we don't support Java currently in FlinkML, unfortunately there are no plans to add it in the near future AFAIK.

I'll check out the rest of the toolbox, thanks for the info!

rcabanasdepaz commented 7 years ago

The idea of porting some of the functionality in AMIDST to FlinkML sounds good. Do you have any documentation about how contributing to FlinkML? Clearly it would be interesting porting the Naive Bayes, but also some other classifiers much more powerful.

thvasilo commented 7 years ago

Sure, our contribution guide is here, if somebody from your team is interested in porting AMIDST code to FlinkML, I'll be able to help them personally.