NeverwinterDP (NDP) the Lambda Architecture based Big Data Pipeline for Hadoop and Data Systems, designed for operational reliability at the scale of billions of events.
[Join us] (https://github.com/DemandCube/NeverwinterDP/blob/master/README.md#join-us-if) if you have [Grit] (http://www.ted.com/talks/angela_lee_duckworth_the_key_to_success_grit)
Alpha - Currently the project is under active core development and not ready for production.
NeverwinterDP is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log/event data.
Neverwinter is an open source distributed data ingestion system/framework for capturing large amounts of data (ranging from gigabytes to petabytes) to be (processed or saved in real-time) to one or more down databases / repositories (i.e. Hadoop, HDFS, S3, Mysql, Hbase, Storm).
Neverwinter was designed and written from the ground up for reliability, scalability, and operational maintainability to support the growing needs of event and message data collection at scale to support startups and enterprise organizations.
Neverwinter - The real-time log/data pipeline. Sparkngin(Nginx) Kafka, Scribengin, leveraging processing in Hadoop and Storm the Data pipeline integrates with Logstash, Ganglia and Nagios Integration. It's a replacement for flume but also can be integrated with it.
It Supports:
Neverwinter is the combination of three major open source project that leverage the best in open source.
Now that we have used enough buzz words. Neverwinter reliably captures lots of data and saves it to hadoop and other systems.
Neverwinter allows data ingestion from any system that can emit http/rest (or other protocols) calls and then publish this data to a down stream database, including Hive, HBase, relational databases or even proprietary data stores. A single Neverwinter pipeline can combine data from multiple sources and deliver them to multiple sources, allowing for data to be delivered to multiple team or an entire organization.
Neverwinter is targeted at data and analytics engineering teams who expect response times ranging from sub-second to minutes. Neverwinter breaks the false choice between having a batch or real-time system. Also the false choice between having a fast or maintainable system.
1) Http/Rest/ZeroMQ Log Collection Endpoint - Sparkngin
2) Data Bus
3) Data Pump/Transport
curl -s https://api.github.com/orgs/DemandCube/repos | ruby -rubygems -e "require 'json'; JSON.load(STDIN.read).each {|repo| %x[git clone #{repo['ssh_url']} ]}"
+-----------+ +-------------+ +---------------+ +------------+ +------------+
|Source | |Rest | |Persistent | |Data | |Data |
| Client |+-->| Endpoint |+-->| Queue/Buffer |+-->| Distributor|+-->| Sink |
| | |(Sparkngin) | |(Kafka/Kinesis)| |(Scribengin)| |(Hive/Hbase)|
+-----------+ +-------------+ +---------------+ +------------+ +------------+
Then make friends with us on the mailing list and start by making a contribution and solving a [Issue] ()
There are many ways you can contribute towards the project. A few of these are:
Jump in on discussions: It is possible that someone initiates a thread on the Mailing List describing a problem that you have dealt with in the past. You can help the project by chiming in on that thread and guiding that user to overcome or workaround that problem or limitation.
File Bugs: If you notice a problem and are sure it is a bug, then go ahead and file a GitHub Issue. If however, you are not very sure that it is a bug, you should first confirm it by discussing it on the Mailing List.
Review Code: If you see that a GitHub Issue has a "patch available" status, go ahead and review it. The other way is to review code submited with a pull request, it is the prefered way. It cannot be stressed enough that you must be kind in your review and explain the rationale for your feedback and suggestions. Also note that not all review feedback is accepted - often times it is a compromise between the contributor and reviewer. If you are happy with the change and do not spot any major issues, then +1 it.
Provide Patches: We encourage you to assign the relevant GitHub Issue to yourself and supply a patch or pull request for it. The patch you provide can be code, documentation, tests, configs, build changes, or any combination of these.
(Remember to update Kanban during this process)
Git Workflow Summary
If you have a issue that needs a code review:
If you did a code review, your requested changes from the submitter have been fixed then:
Step 1(New Fork):
Step 1(Existing Fork e.g. "YourUserName/NeverwinterDP"):
git pull --no-ff https://github.com/DemandCube/NeverwinterDP.git master
Step 2:
git checkout -b feature/featurename master
Step 3:
Step 4 (Optional - but recommended):
git checkout feature/featurename
git rebase -i master
Step 5:
git checkout master
git pull --no-ff https://github.com/DemandCube/NeverwinterDP.git master
Step 6:
git checkout master
git merge feature/featurename
git push origin master
Step 7:
git remote add upstream git@github.com:DemandCube/NeverwinterDP.git
git fetch upstream
git checkout master
git merge upstream/master
Create a patch
Test
Propose New Features or API
Open a GitHub Ticket
How to create a patch file:
SPARKNGIN-12345-0.patch
where 12345
is the Issue number and 0
is the version of the patch. $ git diff > /path/to/SPARKNGIN-1234-0.patch
How to apply someone else's patch file:
$ cd ~/src/Sparkngin # or wherever you keep the root of your Sparkngin source tree
$ patch -p1 < SPARKNGIN-1234-0.patch # Default when using git diff
$ patch -p0 < SPARKNGIN-1234-0.patch # When using git diff --no-prefix
Reviewing Patches
Pull Request
mkdir workspace
cd workspace
git clone https://github.com/DemandCube/NeverwinterDP-Commons
git clone https://github.com/DemandCube/Queuengin
git clone https://github.com/DemandCube/Sparkngin
git clone https://github.com/DemandCube/Scribengin
git clone https://github.com/DemandCube/Demandspike
git clone https://github.com/DemandCube/NeverwinterDP
cd NeverwinterDP-Commons
gradle clean build install
cd ../Queuengin
gradle clean build install
cd ../Sparkngin
gradle clean build install
cd ../Scribengin
gradle clean build install
cd ../Demandspike
gradle clean build install
cd ../NeverwinterDP
gradle clean build install release
cd build/release/NeverwinterDP
#To launch servers, you have two choices - single node server or multi node server
./bin/local-single-jvm-server.sh
#or
./bin/local-multi-jvm-server.sh to launch the servers
#At this point, we need to wait for the servers to come up
#Make sure that there are 9 server are RUNNING before you run local-test.js by running this step
./bin/shell.sh -c server ping
#Run the script to deploy the services. This script will install kafka, sparkngin, demandspike ... services on the servers with the conresponding role
./bin/jsrun.sh jscript/local-deploy.js
#At this point you can point your browser to this url to see status
http://localhost:8080/app/index.html
#Run the script to deploy some demandspike job to demandspike scheduler service
#Those 2 commands will submit various kafka and sparkngin demandspike job test to a job scheduler
#Go to the webui , click the DemandSpike then Job Scheduler to monitor the test results.
./bin/jsrun.sh jscript/local-kafka-test.js
./bin/jsrun.sh jscript/local-sparkngin-test.js
#To run the single job:
./bin/jsrun.sh jscript/ringbearer/job/kafka/hello-job.js
#To kill the servers
./bin/shell.sh -c server exit
#or
pkill -9 -f neverwinter
In Queuengin , Scribengin, DemandSpike. You can run gradle release after build and install, you will find the release in build/release/project directory
and run some test in each release. For example in Queuengin,
cd build/release/Queuengin/bin
#To launch the server
./server.sh
#Ping to check the server status
./shell.sh -c server ping
#To launch the batch script tets
./shell.sh -f hello-xyz.csh
There are known problems:
If you can't actually move issues around let me (Steve) know.
"Accepted" - are tickets you plan to start working on this week.
"Working on" - are tickets your actively working on
"In code review" - are tickets that need a code review ( You should have put a code review request on the mailinglist ) (If no one responded it's up to you to followup)
"Working on documentation and automated tests" - are tickets your finishing the documentation and creating, unit, integration, configuration management/deployment (Ansible) installation tests.
"In documentation and automated test review" - review specifically of the documentation and test. Follows the same process as code reviews. A review should be requested on the mailinglist.
"Done" - The task should pass the automated integration test review from Jenkins
Ruby
HA Testing
Providing
Additional Features: High Availability and Performant Log Collection
Logs are fed into
Prototype framework with zmq in python
Topics
Registry
Heartbeat
Stats
LogTopics
[ ] Develop - Protocol
Out of the box super easy plugin to
[Nginx] -> Openresty, libkafka with spillover buffer, spillagent, window registration and monitoring
[Log]
[Monitoring] Log normal, error, watchdog, normal spill, error spill, watchdog spill
[Concept/Abstraction] _ Emmiter Client
[Reporting]
[Support]
[Dependencies]
[ To investigate ]
Capabilities
+------------+ +-----------+ +------------+ +----------------+
|NW | |NW | |NW | | |
| | | | | | | |
| Front End | | Data Bus | | Data Pump | | End Point |
| Emitter |+-->| |+-->| |+-->|- HDFS |
| - Http Get| | | | | |- Elastic Search|
| - Json | | | | | | |
| - Avro | | | | | | |
+------------+ +-----------+ +------------+ +----------------+
+-----------+ +---------+ +-----+ +--------+
| Log Stash | |Sparkngin| |Kafka| |Hadoop |
|-----------| |---------| |-----| |--------|
| |+--->| |+--->| |+-->|HCatalog|
| | | | | | |HBase |
+-----------+ +---------+ +-----+ +--------+
+
| +--------+
+------->|Storm |
|--------|
| |
| |
+--------+
http://www.asciiflow.com/#Draw
Should a distributed fault tolerant data transport layer from Kafka to hadoop be build on
[ Front End Emmiter ]
[ log collection (logstash] -> [rest end point (nginx) ] -> [ data bus (kafka) ] -> [ data pump/transport (storm or yarn) ] -> [ rdbms (hive - data registration live) | file system (hdfs) | key store (hbase) ]
Look at developing the protocol prototype with a Avro Producer using zmq and a Avro Consumer communicating through kafka. -Version/Lineage,Heartbeat,Source,Header/Footer. Take design aspects from Camus , must provide built monitoring. There needs to be a messaging (source timestamp, system timestamp) and a way to inspect where hour boundries exist on the queue. Additionally need a way to register servers and when they come on and offline for log registration.
Should there be the ability for schema registration, so that schema's can be pushed to downstream?
Should there be a mapping and general payload support. Json support
Should Avro / Thrift / Protobuff / HBase / Hive / Storm - Type Mappings be maintained?
Preferred Development Tools
YourKit supports NeverwinterDP open source project with its full-featured Java Profiler YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: [YourKit Java Profiler] (http://www.yourkit.com/java/profiler/index.jsp) and [YourKit .NET Profiler] (http://www.yourkit.com/.net/profiler/index.jsp).
YOUR COMPANY LOGO
About Your Company
Tracking pixel: