AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
175 stars 90 forks source link

Using Spline in Spark Scala project #785

Open zacayd opened 5 months ago

zacayd commented 5 months ago

Hi @wajda

I have a spark code that I got from someone in the organization that has Scala Code They have a Configuration.conf like this If I add in the config file:

  spline {
    lineageDispatcher = "http"
    lineageDispatcher.http.producer.url = http://localhost:9090/producer
  }

  // Spark configurations
  sql {
    queryExecutionListeners = "za.co.absa.spline.harvester.listener.SplineQueryExecutionListener"
  }

And in the intiallation of Spark I put

pyspark \
  --packages za.co.absa.spline.agent.spark:spark-2.4-spline-agent-bundle_2.12:<VERSION> \
  --conf "spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener" \
  --conf "spark.spline.lineageDispatcher.http.producer.url=http://localhost:9090/producer"

Here is the config:

where do in need to put the Jar

dt="20240120"
output = "s3://aiser-tests/example_project_output"
baseAccumulatorsPath = "s3://aiser-tests/accumulators"
numPartitions = 5864

requestLogTable = "dl_fact.fact_request_p"
impressionLogTable = "dl_fact.fact_impression_p"
clicksLogTable = "dl_fact.fact_click_p"
winLogTable = "dl_fact.fact_win_p"
dpmTable = "dl_udms_work.dim_playback_methods_udms2"
dpsTable = "dl_ingested_data_dim.dim_player_sizes_udms2"
dimAdUnits = "dl_ingested_data_dim.dim_ad_units"
rcTable = "dl_ingested_data_dim.dim_rate_card_lines_extended"
rceTable = "dl_ingested_data_dim.dim_rate_card_lines_exceptions_extended"
eventsTable = "dl_fact.fact_event_p"
sfEventsTable = "dl_fact.fact_sf_events_p"
sfTimeTable = "dl_fact.fact_sf_times_p"

spark{
  master = "local[4]"
  appName = "data-core-agg-request-funnel"
  serializer = ""
  classesToRegister = ""

}
wajda commented 5 months ago

I don't understand the question. Can you clarify it please?