Closed zacayd closed 1 year ago
but i cannot see on kafak that a new topic was creaeted
there might be many reasons for this. First, make sure your Spline agent is working properly. Check logs. Use another dispatcher (e.g. console
or logging
) to make sure the lineage data is actually collected and printed (also read AbsaOSS/spline-spark-agent#394). If you see lineage captured, but not landed to Kafka then the issue might indeed be related to either Kafka dispatcher or your Kafka cluster. Check logs, look for errors, warnings etc.
my question is- can we have a littel phone call to understand this?
Unfortunately we do not have capacity to provide phone support.
when i changed spark.spline.lineageDispatcher http
it workd and showed lineage on the UI
on the logs of the spline kafka container i got
ne-group-1, groupId=spline-group] Connection to node 1001 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
00:02:54.914 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] WARN o.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-spli###
the topic is created but no messages in it
Send the data via kafka dispatcher and check the topic.
on databricks cluster i put spline.lineageDispatcher kafka spline.lineageDispatcher.kafka.producer.bootstrap.servers 192.168.100.11:9092 spline.lineageDispatcher.kafka.topic foo spark.spline.mode REQUIRED
topic created but has no messages in it
ok, please upload the log from the agent. I may be able to say what is wrong from that.
where can it be? on the databricks?
You need driver logs https://stackoverflow.com/questions/69736416/where-to-find-spark-logs-in-databricks
Fromt the log:
An error occurred while calling z:za.co.absa.spline.harvester.SparkLineageInitializer.enableLineageTracking.
: java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/KafkaProducer
Kafka libraries are missing. Include this: https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients/2.4.1
using --packages
from here:
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management
what i should do on databricks cluster?
install the Kafka libraries using method 1 https://stackoverflow.com/questions/60543850/how-to-install-a-library-on-a-databricks-cluster-using-some-command-in-the-noteb
see attached what to choose?
The maven coordinates are in the link I provided yesterday.
You can close this- i used my own Kafak
Hi i am using databricks spark jobs- i saw that you can configre properties to use kafka
but i cannot see on kafak that a new topic was creaeted only a topic that was defined on the yml config of the spline_spline-kafka container my question is- can we have a littel phone call to understand this? thanks in advanse Zacay