This repository contains an example project for building Data Lineage for data lakes using AWS Glue, Amazon Neptune and Spline Agent.
Remarks: This setup works with AWS Glue Data Permissions Model and does not support Lake Formation Permission Model.
To deploy the solution to AWS Cloud with terraform, export your AWS Credentials to terraform (AWS Profile or environment variables)
brew install terraform
git clone https://github.com/aws-samples/data-lineage-for-data-lake-example.git
cd data-lineage-for-data-lake-example
# download spline agent jar
wget https://repo1.maven.org/maven2/za/co/absa/spline/agent/spark/spark-3.1-spline-agent-bundle_2.12/0.6.1/spark-3.1-spline-agent-bundle_2.12-0.6.1.jar -O ./asset/lib/spark-3.1-spline-agent-bundle_2.12-0.6.1.jar
terraform init
terraform apply
To build and test the lineage visual application locally:
src/lineage-visual/src/main.js
axios.defaults.baseURL = "https://xxx.execute-api.<aws-region>.amazonaws.com/dev";
cd src/lineage-visual
npm install
npm run serve
aws glue start-job-run --job-name "RawToCurated_employee_optimize"
aws glue start-job-run --job-name "CuratedToAggregated_employee"
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.