adidas / lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
https://adidas.github.io/lakehouse-engine-docs/
Apache License 2.0
198 stars 36 forks source link

[FEATURE] Add a sample for loading and analysing TPCH data #13

Closed jmcorreia closed 2 months ago

jmcorreia commented 2 months ago

The goal of this PR is to add a python databricks notebook sample on how people can use the Lakehouse Engine to load data, assess its data quality and perform some analysis on top.