Yelp / pyleus

Pyleus is a Python framework for developing and launching Storm topologies.
Apache License 2.0
404 stars 113 forks source link

API for adding java-based spouts to Pyleus topologies #99

Open mzbyszynski opened 9 years ago

mzbyszynski commented 9 years ago

Adds the ability to integrate java-based spouts with Pyleus topologies, based on the way that the kafka spout was previously integrated into Pyleus.

In a nutshell, to add a java spout to your topology you need to:

  1. Write a java class that implements the new SpoutProvider interface and package it in a jar.
  2. Add the jars you need and the spout type-to-java SpoutProvider class mapping to your pyleus.conf
  3. Add the spout to your topology yaml and define the type, output_fields and options.

Documentation

Testing:

All questions, feedback and code review comments welcome! I was thinking about adding a readme.md file to the _java_spoutprovider example, since there are a bunch of steps to build it, but I didn't see anything similar in the other examples so I didn't want to violate any project conventions. Some guidance on that would be great as well.

Thanks!

Closes #93 Closes #91

poros commented 9 years ago

@mzbyszynski, sorry for the late answer. This is a feature that I believe would be a very good addition to pyleus and it is a fair amount of work, so thank you for doing that. And thank you for writing documentation as well :)

(Since it also closes #93, I guess this is based on #94, right?)

However, since this is such a huge change, also in terms of "user interface", I believe we should have people thoughts on that (pinging @patricklucas and @ecanzonieri here) before actually starting to discuss about the details of the code.

Coming to the first and most important question of the list, personally, I have mixed feelings about this change. On the one hand, I have always wanted to get rid of all the Java bits except for the MsgpackSerializer, being the code untested and introducing a complexity and being difficult to maintain. One the other hand, rewriting Pyleus core in Python would requires a huge amount of work I have not enough time to carry on at the moment or in the near future. It also has some serious open issues like losing the local run feature provided by Storm, adding Thrift compiling and implementing and testing a whole new topology "building" and packaging system. For these reasons, I might be convinced that keeping on adding features to the Java core might be not a bad idea and might not increase too much the pain of maintaining the project. Having said that, I am not the owner nor the primary responsible of the project and I have no right to block or pass any pull request single-handedly, so waiting for other people's input here.

slively commented 9 years ago

I'm looking into using pyleus (love it a ton so far), but we are using kinesis instead of kafka, and AWS has a supported kinesis spout for storm that I'd really like to use. This feature would be really awesome for my use case. Looking through the changes and documentation it looks pretty straight forward to use this. I'm gonna give it a try for my use case and report back.