Closed uday1409 closed 3 years ago
Also, in rest gateway, we have added below for connection string as we were not sure of deploying through argument.
<Environment name="spline/database/connectionUrl" type="java.lang.String" value="arangodb://root:@localhost:8529/spline" override="true" />
Absolutely! https://absaoss.github.io/spline/#tldr
https://search.maven.org/search?q=g:za.co.absa.spline.agent.spark
The bundles are exactly what you are looking for. They are fat JARs containing all Spline agent dependencies and are pre-built for different Spark and Scala versions. You can include it in the submit
command or put it directly into the /jars
folder of your Spark distribution
Also, in rest gateway, we have added below for connection string as we were not sure of deploying through argument.
<Environment name="spline/database/connectionUrl" type="java.lang.String" value="arangodb://root:@localhost:8529/spline" override="true" />
looks correct
thanks a lot @wajda for quick response as always. We are doing a POC on this.. Approach 1) In Azure VM, install ArangoDB and create scripts using admin jar, and deploy rest api gateway using war file. Configured connection string in context.xml file 2)Attatch fat jar to the cluster, and set configs such as query listener and producer api(using VM ip address), call the method (lineagetracking) in notebook
Please let me know if you see issue with this approach
If you are doing it centrally and setup "listener" property for Spark, there is no need to call "enableLineageTracking()" method. It's either either.
Thanks for the clarification @wajda . Really helps. I hope rest all , we are doing it right.
Basically, by setting up spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener
you enable lineage tracking for the entire cluster and all jobs.
https://search.maven.org/search?q=g:za.co.absa.spline.agent.spark The bundles are exactly what you are looking for. They are fat JARs containing all Spline agent dependencies and are pre-built for different Spark and Scala versions. You can include it in the
submit
command or put it directly into the/jars
folder of your Spark distribution
Do I need to attach all 3 jars to cluster or only spline agent bundle?
only one bundle that corresponds the Spark version and Scala version inn use.
Background [Optional]
A clear explanation of the reason for raising the question. This gives us a better understanding of your use cases and how we might accommodate them.
Question
A clear and concise inquiry