colinsurprenant / redstorm

JRuby on Storm
Other
298 stars 56 forks source link

RedStorm - JRuby on Storm

Gem Version build status Code Climate Coverage Status

RedStorm provides a Ruby DSL using JRuby integration for the Storm distributed realtime computation system.

Like RedStorm? visit us on IRC at #redstorm on freenode

Check also these related projects:

Documentation

Chances are new versions of RedStorm will introduce changes that will break compatibility or change the developement workflow. To prevent out-of-sync documentation, per version specific documentation are kept in the wiki when necessary.

Dependencies

Stable 0.6.6

Current 0.7.0.beta1

Installation

Usage

Overview

Initial setup

$ redstorm install

This will install default Java jar dependencies in target/dependency, generate & compile the Java bindings in target/classes.

By default, the Java compilation will use the current JVM version as source/target compilation compatibility. If you want to force a specific source/target version, use the --[JVM VERSION] option. For example, to force compiling for 1.6 use:

$ redstorm install --1.6

Create a topology

Create a subdirectory for your topology code and create your topology class using this naming convention: underscore topology_class_file_name.rb MUST correspond to its CamelCase class name.

Here's an example hello_world_topology.rb

require 'red_storm'

class HelloWorldSpout < RedStorm::DSL::Spout
  on_init {@words = ["hello", "world"]}
  on_send {@words.shift unless @words.empty?}
end

class HelloWorldBolt < RedStorm::DSL::Bolt
  on_receive :emit => false do |tuple|
    log.info(tuple[0])
  end
end

class HelloWorldTopology < RedStorm::DSL::Topology
  spout HelloWorldSpout do
    output_fields :word
  end

  bolt HelloWorldBolt do
    source HelloWorldSpout, :global
  end
end

Gems in your topology

RedStorm requires Bundler if gems are needed in your topology. Supply a Gemfile in the root of your project directory with the gems required in your topology. If you are using Bundler also for other gems than those required in the topology you should group the topology gems in a Bunder group of your choice.

Note that bundler is only used to help package the gems prior to running a topology. Your topology code should not use Bundler. With require "red_storm" in your topology class file, RedStorm will take care of setting the gems path. Do not require 'bundler/setup' in the topology.

  1. have Bundler install the gems locally

    $ bundle install
  2. copy the topology gems into the target/gems directory

    $ redstorm bundle [BUNDLER_GROUP]
  3. make sure your topology class has require "red_storm"

    require 'red_storm'

The redstorm bundle command copy the gems specified in the Gemfile (in a specific group if specified) into the target/gems directory. In order for the topology to run in a Storm cluster, the fully installed gems must be packaged and self-contained into a single jar file. This has an important consequence: the gems will not be installed on the cluster target machines, they are already installed in the jar file. This could lead to problems if the machine used to install the gems is of a different architecture than the cluster target machines and some of these gems have C or FFI extensions.

Custom Jar dependencies in your topology (XML Warning! :P)

By defaut, RedStorm installs Storm and JRuby jars dependencies into target/dependency. RedStorm uses Ivy 2.3 to manage dependencies. You can fully control and customize these dependencies.

There are two distinct sets of dependencies: the storm dependencies manages the requirements (Storm jars) for the Storm local mode runtime. The topology dependencies manages the requirements (JRuby jars) for the topology runtime.

You can supply custom storm and topology dependencies by creating ivy/storm_dependencies.xml and ivy/topology_dependencies.xml files. Below are the current default content for these files:

The jars repositories can be configured by adding the ivy/settings.xml file in the root of your project. For information on the Ivy settings format, see the Ivy Settings Documentation. Below is the current default:

Run in local mode

$ redstorm local <sources_directory_path/topology_class_file_name.rb>

note that the topology can also be launched with the following command:

$ java -cp "target/classes:target/dependency/storm/default/*:target/dependency/topology/default/*:<sources_directory_path>" redstorm.TopologyLauncher local <sources_directory_path/topology_class_file_name.rb>

See examples below to run examples in local mode or on a production cluster.

Run on production cluster

Locally installing the Storm distribution is not required. Note that RedStorm does not provide the Storm command line utilities and you will need to install the Storm distribution to use the Storm command line utilities.

  1. create the Storm config file ~/.storm/storm.yaml and add the following

    nimbus.host: "host_name_or_ip"

    you can also use an alternate path and use the --config <other/path/to/config.yaml>

  2. generate target/cluster-topology.jar. This jar file will include your sources directory plus the required dependencies

    $ redstorm jar <sources_directory1> <sources_directory2> ...
  3. submit the cluster topology jar file to the cluster

    $ redstorm cluster <sources_directory/topology_class_file_name.rb>

    or if you have an alternate Storm config path:

    $ redstorm cluster --config <some/other/path/to/config.yaml> <sources_directory/topology_class_file_name.rb>

    note that the cluster topology jar can also be submitted using the storm command with:

    $ storm jar target/cluster-topology.jar -Djruby.compat.version=RUBY1_9 redstorm.TopologyLauncher cluster <sources_directory/topology_class_file_name.rb>

The Storm wiki has instructions on setting up a production cluster. You can also manually submit your topology.

Examples

RedStorm includes several example topologies to help get you started. You can find documentation for the examples here.

Ruby DSL

Ruby DSL Documentation

Multilang ShellSpout & ShellBolt support

Please refer to Using non JVM languages with Storm for the complete information on Multilang & shelling in Storm.

In RedStorm ShellSpout and ShellBolt are supported using the following construct in the topology definition:

bolt JRubyShellBolt, ["python", "splitsentence.py"] do
  output_fields "word"
  source SimpleSpout, :shuffle
end

See the shell topology example

Transactional and LinearDRPC topologies

Despite the fact that both transactional and linear DRPC topologies are now deprecated as of Storm 0.8.1 work on these has been merged in RedStorm 0.6.5. Lots of the work done on this is required toward Storm Trident topologies. Documentation and examples for transactional and linear DRPC topologies will be added shorty.

Known issues

RedStorm Development

It is possible to fork the RedStorm project and run local and remote/cluster topologies directly from the project sources without installing the gem. This is a useful setup when contributing to the project.

Requirements

Workflow

Remote cluster testing

Vagrant & Chef configuration to create a single node test Storm cluster is available in https://github.com/colinsurprenant/redstorm/tree/master/vagrant/

Notes about 1.8/1.9 JRuby compatibility

Ruby 1.9 is the default runtime mode in JRuby 1.7.x

If you require Ruby 1.8 support, there are two ways to have JRuby run in 1.8 runtime mode:

By defaut, a topology will be executed in the same mode as the interpreter running the $ redstorm command. You can force RedStorm to choose a specific JRuby compatibility mode using the [--1.8|--1.9] parameter for the topology execution in local or remote cluster.

$ redstorm local|cluster [--1.8|--1.9] ...

If you are not using the DSL and only using the proxy classes (like in examples/native) you will need to manually set the JRuby version in the Storm Backtype::Config object like this:

class SomeTopology
  RedStorm::Configuration.topology_class = self

  def start(base_class_path, env)
    builder = TopologyBuilder.new
    builder.setSpout ...
    builder.setBolt ...

    conf = Backtype::Config.new
    conf.put("topology.worker.childopts", "-Djruby.compat.version=RUBY1_8")

    StormSubmitter.submitTopology("some_topology", conf, builder.createTopology);
  end
end

How to contribute

Fork the project, create a branch and submit a pull request.

Some ways you can contribute:

Projects using RedStorm

If you want to list your RedStorm project here, contact me.

Author

Colin Surprenant, http://github.com/colinsurprenant/, @colinsurprenant, colin.surprenant@gmail.com, http://colinsurprenant.com/

Contributors

License

Apache License, Version 2.0. See the LICENSE.md file.