Web site usage, social network behavior and Internet traffic are examples of systems that appear to follow the Power law, where most of the events are due to the actions of a very small few. Knowing at any given point in time which items are trending is valuable in understanding the system.
frequent-algorithm
is a Ruby implementation of the FREQUENT algorithm
for identifying frequent items in a data stream in sliding windows.
Please refer to Identifying Frequent Items in Sliding Windows over On-Line
Packet Streams, by
Golab, DeHaan, Demaine, López-Ortiz and Munro (2003).
Challenges for Real-time processing of data streams for frequent item queries include:
Therefore, a solution should have the following characteristics:
LOOP
- For each element e in the next b elements:
If a local counter exists for the type of element e:
Increment the local counter.
Otherwise:
Create a new local counter for this element type
and set it equal to 1.- Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.
- Delete all local counters
- For each type named in S:
If a global counter exists for this type:
Add to it the count recorded in S.
Otherwise:
Create a new global counter for this element type
and set it equal to the count recorded in S.- Add the count of the kth largest type in S to δ.
- If sizeOf(Q) > N/b:
(a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.
(b) For all element types named in S':
Subtract from their global counters the counts
recorded in S'
If a counter is decremented to zero:
Delete it.
(c) Output the identity and value of each global counter > δ.— Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003
require 'frequent-algorithm'
# data is pi to 1000 digits
pi = File.read('test/frequent/test_data_pi').strip
data = pi.scan(/./).each_slice(b)
N = 100 # size of main window
b = 20 # size of basic window
k = 3 # we are interested in top-3 numerals in pi
alg = Frequent::Algorithm.new(N, b, k)
# read in and process the 1st basic window
data.next.each { |x| alg.process(x) }
# and the top-3 numerals are?
top3 = alg.report
puts top3
# lather, rinse and repeat
data.next.each { |x| alg.process(x) }
The development of this gem requires the following:
bundler
rake
minitest
(unit testing)yard
(documentation)rdiscount
(Markdown)Building, testing and release of this rubygem uses the following
rake
commands:
rake clean # Remove any temporary products
rake clobber # Remove any generated file
rake test # Execute unit tests
rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
rake release # Create tag vn.n.n and build and push
# frequent-algorithm-n.n.n.gem to Rubygems
frequent-algorithm
uses yard
and
rdiscount
for Markdown documentation.
Check out Getting Started with
Yard.
frequent-algorithm
uses
MiniTest::Unit
for
unit testing.
Please refer to Publishing To Rubygems.org in the Rubygems Guide.
dev-branch
(git fetch && git checkout dev-branch
)git branch my-new-feature && git checkout my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature:dev-branch
)You may wish to read the Git book online.
Please see the {file:CHANGELOG} for this gem's release history.
frequent-algorithm is provided under the terms of the MIT license.
Copyright © 2015, Willie Tong & Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.