Closed rzbhatti closed 6 years ago
For graph visualization of the customer journey and path analytics visualization tools like cytoscope can be used. http://js.cytoscape.org http://marvl.infotech.monash.edu/webcola/index.html
Is “thumbing window aggregates” a typo for "tumbling window aggregates” ?
+1
@rzbhatti are you proposing a set of predefined classifiers/schemas? Please give some examples. I support your proposal to provide a toolkit for this kind of analytics. I wonder if the use cases are limited to clickstream data or if this could be a generic approach to deal with rule based classifications and aggregations of messages or event data records.
I am not sure how it should be classified - as a toolkit or as an example/sample/pattern. There are certain tricky/complex operators/composites which are out of the reach of the ordinary developer and hence it is worth making this public. These are:
(a) user-defined functions in aggregates is needed because out of the box functions are not sufficient. (b) cascading of aggregates needs to be done so as to avoid large memory footprint. (c) if the application needs to be brought down for maintenance, large windows of data is not lost (e.g., if clickstream is being tracked for a seven day period). (d) there are companion algorithms (and code) on the UI side, which lets you do "Since" queries - how many clicks for xyz since Wednesday....
I support having a clickstream repository. Instead of a toolkit, would it make sense to classify them as microservices where we provide pre-built applications that perform these complex analytic functions? Would an example be provided to show how to stitch these together to produce a meaningful application. Is this something similar to streamsx.health where we would be providing domain specific services, accelerating the development of clickstream analytics applications?
I agree that it is definitely more than just a toolkit of functions and operators only. It contains sample microservices applications for data acquisition, stream classification, global and session level aggregation, and finally the graphical visualizations of the analytics etc.
+1 though I would encourage thinking about microservices as being intended to be used by users out of the box, rather than just being samples.
Maybe initially the toolkit could stay focused on clickstream analysis and then if a general pattern exists it could be extracted, rather than trying to start out with a general purpose solution with no clear goals in mind.
Repository created, waiting for response from @rzbhatti regarding CLA... and then I will create the committer team.
Added @rzbhatti to streamsx.clickstream project. Please review this welcome page to familiarize with some of the project guidelines: https://github.com/IBMStreams/administration/blob/master/welcome.md
Proposal
Proposed here is a toolkit for clickstream analytics. This toolkit will provide the basic functions and operators to build an application for click or tap stream analytics. It will also provide a streaming architecture based sample application for clickstream analytics.
Motivation
The real-time streaming analytics of click or tap streams bears an undeniable significance for digital transformation of all growing enterprises. It provides a way to monitor, qualitatively and quantitatively, the effectiveness of web or mobile applications. Our client engagement experience with large scale mobile enterprises show that the real-time clickstreams analytics is imperative to:
Toolkit Components
Clickstream Classification Operator
A scalable and dynamically updated set of classification rules are defined in a JSON file. Each JSON rule specifies string attribute of the input stream, to be matched against a specified string, partial string, or regex. When a rule is matched the specified attributes of the output stream are updated as per the given classification by that rule.
Custom aggregate functions for progressive and cascaded aggregates
Instead of “sliding windows aggregates”, cascaded “tumbing window aggregates” are used to produce Count-By-Distinct function.
Graph generator operator
A custom SPL operator to produce a graph JSON for: