Yelp / pyleus

Pyleus is a Python framework for developing and launching Storm topologies.
Apache License 2.0
403 stars 107 forks source link

Set output_field of a bolt using configuration file? #84

Open johanneshk opened 9 years ago

johanneshk commented 9 years ago

Hi,

I'm trying to solve the following problem with pyleus: I have a processor bolt that processes data (wow). Depending on options in some config file, this processing bolt should emit data on different streams. I have multiple processing bolt in my topology, all configured in a different way. Essentially, the processing bolt matches several queries against input streams in different place of the topology. The queries are user-specified. Each match of a query should be emitted on its on stream (such that downstream components only get those matches for which they subscribed).

Problem: The definition of the output_fields is static and for all instances of the processor bolt the same. This would not be a problem if I could either specify the output_fields during runtime, once the processor parsed its configuration. Or put the output stream configuration into the pyleus_topology.yaml . Both is not possible. I wonder if you have an idea how to tackle this problem.

Long story short: Is there a possibility to set the output fields of a component more flexible? Preferably I would like to set output fields in the pyleus_topology.yaml on a 'per component' basis.

A workaround may be, to define a number of dummy output_fields in the processor bolt, and use these to communicate a varying number of query matches.

Another workaround: Each downstream component gets all matches and has to filter for the interesting ones. Of course, this produce unnecessay communication…

Right now I have my own config file and use a script to create the input file for pyleus.

William-Sang commented 9 years ago

That sounds great!