about scikit-multilearn

ljvmiranda921 commented 7 years ago

Hi there!

I am wondering if you are one of the original owners/maintainers of scikit-multilearn, and if this repository will carry over that project. It seems that the former is already inactive: some pull requests are not being answered, multiple issues exist, and dependencies are not that managed properly for all Python versions (especially the thing about graph-tools and MEKA's Java dependency).

I am just wondering what the vision for that library would be. I believe that providing a multi-label classification library in Python, to augment scikit-learn, is very useful.

I'd also like to contribute in whatever way. I am currently a graduate student doing research in bioinformatics (hence multi-label classification). I'd be happy to help re-structuring the project, writing docs, refactor some code, and clean-up some unit tests.

Perhaps we can start by managing all dependencies to create a successful travis build. It seems that graph-tools is tricky, and the docker image provided is only for Python 3. In addition to that, there's also the MEKA extension to take care of.

Maybe we can omit for a while the support for these features? And focus for a while with the "easier" ones?

Thank you so much, I'd really love to help out in this project and find people who are still interested to continue maintaining scikit-multilearn.

ChristianSch commented 7 years ago

Hey there,

I sadly am just an occasional contributor without any rights in the repo. The status of the projects bugs me a lot as well. I started skml primarily for myself, as the basic methods such as cc and ecc didn't work at the time, hence this project is somewhat suplementary for things I need and implemented myself as I didn't have any hopes that my pull requests would be merged.

I approached the maintainer of scikit-multilearn already about a slack/irc/mailing list communication method, but to no avail. I can't just get in touch with him. Maybe if we join forces we can do something about it.

Some important things from the top of my head:

communication (slack, irc, mailing list)
a project that manages the repo for easier collaboration distributing the responsibility over the project that actually serves a purpose and is used, with roles and everything
automatic testing is a must have (as you noted as well)
clean up the base code to adhere to python standards (if not done already)
cleaning up the issues and maybe have a TODO file for classifiers/methods that are missing
dependency management

I'm not sure where to go from here. Maybe we should look for more interested people, then approach the maintainer and see if we can work something out. If not, we fork it (we can use skml as a name, I'm happy to ditch this project if we get things going) and do it on our own.

What do you think?

Cheers!

ljvmiranda921 commented 7 years ago

Sure! I'd be happy to help in any way I can.

But maybe just out of respect, we can contact the owner for one last time? It seems that the library has a pre-print in arxiv, and has received grants (and agreements) that might be quite tricky to tread on. Also, the authors may have a vision in the library that they wanted to share.

I agree with the things you pointed out. The style of the codebase seems to have a hint of Java/C++, and we can refactor them to be more Pythonic. In fact, some of the methods and mixins are quite redundant and can be removed.

Personally, I'm not yet that good in Python, but I can help you in refactoring some of the previous code, writing most of the documentation (contribution guidelines, API doc, code of conduct, etc. etc.), and designing new tests. 😄

As for major milestones, we can start things of by supporting those that are dependent only in the Python ecosystem. We can add support to MEKA and graph-tools later on. We can start small, then if we have a working base we can promote this in Reddit or HackerNews to find new contributors and whatnots.

Thank you for your reply, hopefully we get to build this much better!

Cheers!

ChristianSch commented 7 years ago

Oh, sorry, I didn't meant that we should just skip him! I was just thinking that if we have some things we want and have mapped that out, it's easier for him to help, as he obviously has carried on or doesn't have much time. Of course it would be best to have his blessing and his help. I think having contributor status in the existing project would be optimal, as well as having a method of communication where the maintainer has access as well, if he wants to. Maybe we should find out what he's up to. I'll try to find out his email. I think we should get in touch with him as a group of "future contributors" or something. Is there anyone else you know about besides you and me?

ljvmiranda921 commented 7 years ago

Ooops sorry I misunderstood, my bad. Yes you are totally correct! 👍

According to their publication, the emails are:

piotr.szymanski@{pwr.edu.pl,illimites.edu.pl} (this is @niedakh, the owner of the repo)
niedakh@gmail.com (same guy as above)
tomasz.kajdanowicz@pwr.edu.pl

As of now, I don't know anyone else. I contacted you because you seem to be the most active contributor in the library. 😄

niedakh commented 7 years ago

Hi all,

I would more than love to extend the team that's working on scikit-multilearn given that up till now it was mostly myself, Tomasz's student's plus a couple of contributions like @ChristianSch. I do look at Christian's fork and skml repo regularly, but haven't seen any new code there recently. I've also sent Christian an invite to the dev group, but he must've missed it.

I'm more than willing to add you to the scikit-multilearn organization and work together, it is really hard to maintain the library alone. At the moment I'm concentrated on providing multilabel stratification methods for scikit-ml, fixing the code according to feedback from JMLR reviewers and providing a patch for a large scale problem with sparse outputs in scikit-learn - which is a lot of work but also is the crucial point to make the library more optimal - https://github.com/scikit-learn/scikit-learn/issues/8908

In general my vision is for scikit-multilearn to be the library of choice for the multi-label classification problem with scikit-learn. I've sent you invites to the scikit-multilearn slack, let's continue discussion there shall we? The address is: https://scikit-ml.slack.com/

ljvmiranda921 commented 7 years ago

Awesome @niedakh!

Oops, just one more thing, is it fine to send the invite to ljvmiranda@gmail.com instead?

niedakh commented 7 years ago

Cool, sent it!

ChristianSch / skml

about scikit-multilearn #6