aws-samples / data-lineage-for-data-lake-example

MIT No Attribution
24 stars 22 forks source link

Data Lineage for Data Lakes Example

This repository contains an example project for building Data Lineage for data lakes using AWS Glue, Amazon Neptune and Spline Agent.

Remarks: This setup works with AWS Glue Data Permissions Model and does not support Lake Formation Permission Model.

Stack

Build & Deployment

To deploy the solution to AWS Cloud with terraform, export your AWS Credentials to terraform (AWS Profile or environment variables)

alt text

brew install terraform
git clone https://github.com/aws-samples/data-lineage-for-data-lake-example.git
cd data-lineage-for-data-lake-example

# download spline agent jar
wget https://repo1.maven.org/maven2/za/co/absa/spline/agent/spark/spark-3.1-spline-agent-bundle_2.12/0.6.1/spark-3.1-spline-agent-bundle_2.12-0.6.1.jar -O ./asset/lib/spark-3.1-spline-agent-bundle_2.12-0.6.1.jar

terraform init

terraform apply

To build and test the lineage visual application locally:

cd src/lineage-visual
npm install
npm run serve

Getting started

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.