aws-samples / real-time-analytics-with-apache-pinot-on-aws

MIT No Attribution
5 stars 2 forks source link

Build real-time analytics solution with Apache Pinot on AWS

In this AWS Sample you will deploy an Apache Pinot Architecture using Amazon EC2. Apache Pinot is an Open Source real-time distributed OLAP datastore capable of

It ingests data from both streaming and batch sources and organizes it into logical tables distributed across multiple nodes in a Pinot cluster, ensuring scalability.

Architecture

Architecture

Solution Description

Getting started

Pre-requisites

Before you get started, make sure you have the following prerequisites:

Deploying Apache Pinot

Visualizing Using Tableau

Deployment

We will use AWS CDK CLI to deploy the solution.

This solution deploys the following:

Please replace with your account id and aws region

  1. Git clone this repository
    git clone https://github.com/aws-samples/real-time-analytics-with-apache-pinot-on-aws.git
  2. Go into the repo
    cd real-time-analytics-with-apache-pinot-on-aws
  3. Install NPM libraries
    npm install
  4. Bootstrap the AWS CDK
    cdk bootstrap aws://<account-id>/<aws-region>
  5. Deploy stack. You will need to provide as parameter your IP address. This is the IP address from where you will be able to connect to the Apache Pinot Controller UI. Make sure the IP address finishes with the /32 subnet mask format.
    cdk deploy --parameters IpAddress="<your-ip-address/32>"
  6. Once the deployment finishes you can get from the CloudFormation output the Load Balancer DNS to access the UI

Ingesting Data

To ingest data into the Amazon Kinesis Data Streams you can use the Kinesis Data Generator. Please follow the instructions to deploy

Use the following template to send data to the Amazon Kinesis Data Streams called pinot-stream

{
"userID" : "{{random.number(
        {
            "min":1,
            "max":100
        }
    )}}",
"productName" : "{{commerce.productName}}",
"color" : "{{commerce.color}}",
"department" : "{{commerce.department}}",
"product" : "{{commerce.product}}",
"campaign" : "{{random.arrayElement(
        ["BlackFriday","10Percent","NONE"]
    )}}",
"price" : {{random.number(
        {   "min":10,
            "max":150
        }
    )}},
"creationTimestamp" : "{{date.now("YYYY-MM-DD hh:mm:ss")}}"
}

You can now in the Apache Pinot Tables go to the Query console and see how data is ingested and run queries.

Clean up

To delete all created stack resources you can run

cdk destroy --all

License

This library is licensed under the MIT-0 License. See the LICENSE file.