boilingdata / boiling-insights

Boiling Insights - From raw S3 data to charts in seconds
11 stars 0 forks source link

Boiling Insights

TL;DR Download Boiling Insights

Boiling Insights runs DuckDB efficiently on your laptop over your data on S3. You can run any DuckDB extensions you like, query other data sources etc. it is up to you. The DuckDB website has all the needed documentation for SQL. It has an optional Boiling Data cloud integration to boost data processing and automation.

Boiling Insights is a local first data stack for building end to end pipelines from ingestion to transformation to visualization. Data is synchronized with S3 and multiple Boiling Insights applications can be running over the same data.

The application runs on web too at https://app.boilingdata.com/, but has limited functionality compared to native application. It supports querying with BoilingData, and will support interactive Dashboards as well.

Road Map

Vision

Distributed local first data stack for building end to end pipelines from ingestion to transformation to visualization with data persisted on S3. Along with rich set of ever growing Data Profiles over known data sources/sets to allow non-data people to run analytics over their data with ease, and allow Data Analysts and Engineers to build on top of existing and avoid repeating themselves again and again (DRY).

Driving Principles - Data Profiles

"S3 first", "compute once", "local first" are some of the principles driving Boiling Insights. It reads raw data from S3, compacts and optimises the data and derives multiple aggregation tables. Visualizations are Apache ECharts configurations (we could add support for Vega Lite as well, if there is demand) and SQL clauses reading data from the aggregation tables. These e2e configurations are called "Data Profiles". They record Data Engineers' and Data Analysts' work over (known) data sources so that you don't have to repeat yourself again and again, but extend.

Every data processing stage is synchronized back to S3, so if the stage already exists on S3, it does not have to be computed again. The more users working over the same data on S3, the more data processing power and faster results.

graph LR
    A[RAW S3 Data] -->|Stage 1| B[Compacted & Optimised Data]

    B -->|Stage 2| C1[Aggregation Table 1]
    B -->|Stage 2| C2[Aggregation Table 2]
    B -->|Stage 2| C3[Aggregation Table 3]

    C1[Aggregation Table 1] -->|Stage 3| D1[Viz 1]
    C1[Aggregation Table 1] -->|Stage 3| D2[Viz 2]
    C2[Aggregation Table 2] -->|Stage 3| D3[Viz 3]
    C3[Aggregation Table 3] -->|Stage 3| D4[Viz 4]
    C3[Aggregation Table 4] -->|Stage 3| D5[Viz 5]

    style A fill:#E6F2FF,stroke:#4D94FF
    style B fill:#E6FFE6,stroke:#4DFF4D
    style C1 fill:#FFE6E6,stroke:#FF4D4D
    style C2 fill:#FFE6E6,stroke:#FF4D4D
    style C3 fill:#FFE6E6,stroke:#FF4D4D
    style D1 fill:#FFF2E6,stroke:#FFB84D
    style D2 fill:#FFF2E6,stroke:#FFB84D
    style D3 fill:#FFF2E6,stroke:#FFB84D
    style D4 fill:#FFF2E6,stroke:#FFB84D
    style D5 fill:#FFF2E6,stroke:#FFB84D

In other words, Boiling Insights is like a distributed Data Warehouse (DDWH) compute with S3 storage. However, since it synchronizes data on laptop, working with the data is blazing fast and iterating the whole e2e data pipeline from raw data to visuazliations happens in seconds. In addition to the full e2e view, this fast feedback cycle and UX/DX makes Boiling Insights stand apart.

INSTALL

  1. Download zip compressed Download Boiling Insights for Mac OSX (arm64)
  2. Uncompress the zip archive (double-click it)
  3. Optionally, copy the resulting app to your Applications folder
  4. Start the app (double-click the App)

NOTE: Bare with us, we're adding support for more archs.

DATA PROFILES

data-profiles
├── aws-lambda-json-logs  # Data Profile
│   ├── chart-models      # stage 3 - visualisation with SQL and Apache ECHarts
│   ├── database-models   # stage 2 - derived / aggregated DuckDB tables
│   └── etl-models        # stage 1 - ingestion, compaction, optimisation
├── aws-cloudtrail-logs   # Data Profile
│   ├── database-models   # ..
│   └── etl-models
└── general               # General charts used by Boiling Insights itself
    └── chart-models

Data Profiles are configurations against known raw data sets and include SQL run directly with DuckDB.

Data Profile: AWS Lambda Logs

Boiling Insights currently supports "AWS Lambda Logs" aws-lambda-json-logs Data Profile.

AWS Lambda JSON Logs data profile is special as it requires installing Data Tap for the logs ingestion and AWS Lambda Extension for sending the logs to the URL. In addition, it requires fetching authorization token for the Data Tap and adding it into the Lambda environment variables so that the extension can pick it up and use it when sending the logs to the URL (without authorization the Data Tap rejects the message).

  1. Install Data Taps to get ingestion URL, and then
  2. Add Boiling AWS Lambda Extension to your Lambda functions and
  3. Configure your Lambda Functions to use JSON logging, and finally
  4. Get Boiling Data authorzation token with bdcli and set it into the Lambda Function environment variables (see Data Taps instructions)
  5. Optionally, you can disable CloudWatch Logs logging from your Lambda Logs by disabling the logs:* rights from the Lambda Functions IAM Role (to save costs)