AbsaOSS / spline

Data Lineage Tracking And Visualization Solution
https://absaoss.github.io/spline/
Apache License 2.0
600 stars 155 forks source link

Spline Configuration & Architectural Guidance #675

Closed agrajm closed 4 years ago

agrajm commented 4 years ago

Background

I'm trying to achieve the following setup, trying to capture the data lineage from Spark Jobs running on Azure DataBricks using Spline, put the lineage in MongoDB (using spline-persistence-mongo) and then visualize using Spline UI. Please see the attached high level architecture diagram

high-level-arch

Please note that

Questions

  1. Need to setup the rest gateway in an authenticated mode so that communication b/w Databricks and Rest Gateway is secured - how to configure the spline.producer.url to include the authentication (username/password) or use some other configuration in spline to achieve that ?

  2. I've currently installed the following JARs on my cluster:

    • agent_core_2_11_0_6_0_SNAPSHOT.jar
    • spark_2_4_spline_agent_bundle_2_11_0_6_0_SNAPSHOT.jar
    • spline_persistence_mongo_0_3_9.jar

Do I need the agent-core jar also or just the spline-agent-bundle-jar for the specific spark versions is enough ?

  1. Do you foresee any problems with this setup ?

My goal is to capture the lineage finally in Atlas but first trying with Mongo, and if this works then plan to capture this in Atlas instead..... I saw someplace that with Spline 0.4 we started supporting Atlas as persistence but yet to figure out the details.

Regards, Agraj

cerveada commented 4 years ago

Hello, nice picture. Unfortunately out of the box Atlas support was dropped from Spline 0.4 and on. Also, we switch to Arango DB database in 0.4 so it won't work with Mongo DB anymore.

You can read more about atlas support here: #279

agrajm commented 4 years ago

@cerveada let's forget Atlas as of now, I'm happy to replace Mongo by Arango DB -- Infact I've it installed on the same VM running the Spline REST Gateway & Spline UI for my use-case. My question remains How do I configure Gateway with authentication when pushing the lineage captured by Spline running on Azure Databricks -- I've already setup ArangoDB with AuthN but if we have some documentation on how to use ArangoDB as the persistence - like which spline properties to use.

cerveada commented 4 years ago

Include the credentials in the databse url config spline.database.connectionUrl the same way as for admin tool url. #666

It's described in the documentation, but without the authentication.