TL;DR Download Boiling Insights
Boiling Insights runs DuckDB efficiently on your laptop over your data on S3. You can run any DuckDB extensions you like, query other data sources etc. it is up to you. The DuckDB website has all the needed documentation for SQL. It has an optional Boiling Data cloud integration to boost data processing and automation.
Boiling Insights is a local first data stack for building end to end pipelines from ingestion to transformation to visualization. Data is synchronized with S3 and multiple Boiling Insights applications can be running over the same data.
The application runs on web too at https://app.boilingdata.com/, but has limited functionality compared to native application. It supports querying with BoilingData, and will support interactive Dashboards as well.
s3://buck/prefix/{{year}}/{{month}}/{{day}}
while storing the optimised and compacted data into hive partitioned prefix.uc_catalog
, delta
)Distributed local first data stack for building end to end pipelines from ingestion to transformation to visualization with data persisted on S3. Along with rich set of ever growing Data Profiles over known data sources/sets to allow non-data people to run analytics over their data with ease, and allow Data Analysts and Engineers to build on top of existing and avoid repeating themselves again and again (DRY).
"S3 first", "compute once", "local first" are some of the principles driving Boiling Insights. It reads raw data from S3, compacts and optimises the data and derives multiple aggregation tables. Visualizations are Apache ECharts configurations (we could add support for Vega Lite as well, if there is demand) and SQL clauses reading data from the aggregation tables. These e2e configurations are called "Data Profiles". They record Data Engineers' and Data Analysts' work over (known) data sources so that you don't have to repeat yourself again and again, but extend.
Every data processing stage is synchronized back to S3, so if the stage already exists on S3, it does not have to be computed again. The more users working over the same data on S3, the more data processing power and faster results.
graph LR
A[RAW S3 Data] -->|Stage 1| B[Compacted & Optimised Data]
B -->|Stage 2| C1[Aggregation Table 1]
B -->|Stage 2| C2[Aggregation Table 2]
B -->|Stage 2| C3[Aggregation Table 3]
C1[Aggregation Table 1] -->|Stage 3| D1[Viz 1]
C1[Aggregation Table 1] -->|Stage 3| D2[Viz 2]
C2[Aggregation Table 2] -->|Stage 3| D3[Viz 3]
C3[Aggregation Table 3] -->|Stage 3| D4[Viz 4]
C3[Aggregation Table 4] -->|Stage 3| D5[Viz 5]
style A fill:#E6F2FF,stroke:#4D94FF
style B fill:#E6FFE6,stroke:#4DFF4D
style C1 fill:#FFE6E6,stroke:#FF4D4D
style C2 fill:#FFE6E6,stroke:#FF4D4D
style C3 fill:#FFE6E6,stroke:#FF4D4D
style D1 fill:#FFF2E6,stroke:#FFB84D
style D2 fill:#FFF2E6,stroke:#FFB84D
style D3 fill:#FFF2E6,stroke:#FFB84D
style D4 fill:#FFF2E6,stroke:#FFB84D
style D5 fill:#FFF2E6,stroke:#FFB84D
In other words, Boiling Insights is like a distributed Data Warehouse (DDWH) compute with S3 storage. However, since it synchronizes data on laptop, working with the data is blazing fast and iterating the whole e2e data pipeline from raw data to visuazliations happens in seconds. In addition to the full e2e view, this fast feedback cycle and UX/DX makes Boiling Insights stand apart.
NOTE: Bare with us, we're adding support for more archs.
data-profiles
├── aws-lambda-json-logs # Data Profile
│ ├── chart-models # stage 3 - visualisation with SQL and Apache ECHarts
│ ├── database-models # stage 2 - derived / aggregated DuckDB tables
│ └── etl-models # stage 1 - ingestion, compaction, optimisation
├── aws-cloudtrail-logs # Data Profile
│ ├── database-models # ..
│ └── etl-models
└── general # General charts used by Boiling Insights itself
└── chart-models
Data Profiles are configurations against known raw data sets and include SQL run directly with DuckDB.
etl-models
): Raw data set compaction SQLdatabase-models
): List of aggregation table SQL derived from the compacted data setchart-templates
): A set of Apache Echarts configurations with corresponding SQL clauses for ready made chartsBoiling Insights currently supports "AWS Lambda Logs" aws-lambda-json-logs
Data Profile.
If you want to add more, create an issue or PR to this repository.
If you want to be Data Driven, but don't know how to start, we can help you create Data Profiles over your data with visualisations too. Contact dan.forsberg@boilingdata.com for more information
AWS Lambda JSON Logs data profile is special as it requires installing Data Tap for the logs ingestion and AWS Lambda Extension for sending the logs to the URL. In addition, it requires fetching authorization token for the Data Tap and adding it into the Lambda environment variables so that the extension can pick it up and use it when sending the logs to the URL (without authorization the Data Tap rejects the message).
bdcli
and set it into the Lambda Function environment variables (see Data Taps instructions)logs:*
rights from the Lambda Functions IAM Role (to save costs)