AbsaOSS / spline

Data Lineage Tracking And Visualization Solution
https://absaoss.github.io/spline/
Apache License 2.0
603 stars 155 forks source link

Spline producer url unreachable from spark #1354

Closed learnerr101 closed 3 months ago

learnerr101 commented 3 months ago

I used the TLDR configuration and packages given in https://absaoss.github.io/spline/. za.co.absa.spline.agent.spark:spark-2.4-spline-agent-bundle_2.12:0.5.2 and 'spark.spline.producer.url' as 'http://localhost:9090/producer'

But I get this error in my spark job:

Screenshot 2024-07-24 003938

Configurations: Spark -> 2.4.2 Scala -> 2.12 Commands used: spark-submit --packages za.co.absa.spline.agent.spark:spark-2.4-spline-agent-bundle_2.12:0.5.2 --conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" --conf "spark.spline.producer.url=http://localhost:9090/producer" AB_2.py

I have also tried spark 3.0.0 with compatible artifacts and Java version but I'm getting the same error.

cerveada commented 3 months ago

Try to follow steps from this guide: https://github.com/AbsaOSS/spline/discussions/1225

learnerr101 commented 3 months ago

So these were the steps I performed:

Troubleshooting Spline Agent: I was able to see the logs pertaining to the lineage. Attaching the log file for your reference Log.txt Troubleshooting Arango DB: image

Troubleshooting Spline Server: image

Troubleshooting Spline UI: image

Still I'm getting the same error for some reason.

cerveada commented 3 months ago

In your config you use port 9090 but on the picture from server i see port 8080

learnerr101 commented 3 months ago

Yeah, I used commands with both the configs.

With 9090: spark-submit --packages za.co.absa.spline.agent.spark:spark-2.4-spline-agent-bundle_2.12:0.5.2 --conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" --conf "spark.spline.producer.url=http://localhost:9090/producer" AB_2.py

With 8080: spark-submit --packages za.co.absa.spline.agent.spark:spark-2.4-spline-agent-bundle_2.12:0.5.2 --conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" --conf "spark.spline.producer.url=http://localhost:8080/producer" AB_2.py

In both the scenarios I get the same error.

cerveada commented 3 months ago

In the log I see output from LoggingLineageDispatcher so that is working. Now switch to http dispatcher.

learnerr101 commented 3 months ago

Yeah I tried the following command:

_spark-submit  
--packages za.co.absa.spline.agent.spark:spark-3.0-spline-agent-bundle_2.12:2.1.0  
--conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener 
--conf spark.spline.lineageDispatcher=http  
--conf spark.spline.lineageDispatcher.http.producer.url=http://localhost:8080/producer  
AB_2.py_

So the config as of now is: Spark 3.0.0 Scala 2.12 Spline : 2.1.0

But error still prevails!

cerveada commented 3 months ago

If everything else works, it must be some networking issue. Are you sure that the server is reachable? Is the spark running locally on the same machine as the severe? If spark is somewhere else, localhost will not work.

Also, if a server or agent is running in the docker container, it may block the connection.

wajda commented 3 months ago

I'm converting this issue into a discussion as the issue is clearly on the user's side.