AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
183 stars 93 forks source link

An error occurred when using kafkaLineageDispatcher send #623

Closed jinmu0410 closed 1 year ago

jinmu0410 commented 1 year ago
截屏2023-03-07 21 16 30
Caused by: java.lang.IllegalStateException: Cannot perform operation after producer has been closed
        at: org.apache.kafka.clients.producer.KafkaProducer.throwIfProducerClosed(KafkaProducer.java:863)
jinmu0410 commented 1 year ago

The local mode is normal, but this problem occurs on yarn

jinmu0410 commented 1 year ago

spark 3.1.2 and yarn 3.1

jinmu0410 commented 1 year ago

I see that this problem has occurred before, has it been resolved in the future? Now my situation is that this bug is triggered every time

glasalvia commented 1 year ago

Is there any news on this topic? Thanks in advance

cerveada commented 1 year ago

The issue might be caused by Spark shutting down too quickly and not allowing the agent to finish it's job.

Seems like similar problem happens here: https://github.com/AbsaOSS/spline-spark-agent/issues/478

Could you try to set high hadoop.service.shutdown.timeout and let us know if it helps?

jinmu0410 commented 1 year ago

I think it may be a problem of compiling and selecting the spark version.

截屏2023-03-25 09 36 26

this way is wrong

截屏2023-03-25 09 37 08

this choice is right. After choosing the right compilation package, I did not encounter the problem of kafka

jinmu0410 commented 1 year ago

sorry,This problem occurs again 截屏2023-03-27 11 17 50

cerveada commented 1 year ago

Could you try to set high hadoop.service.shutdown.timeout and let us know if it helps?

jinmu0410 commented 1 year ago

Could you try to set high hadoop.service.shutdown.timeout and let us know if it helps?

I tried it, it doesn't seem to work。

jinmu0410 commented 1 year ago

I looked at the code and found some key places

截屏2023-03-27 16 09 01

sys.addShutdownHook(sf.close()) cause producer.close()

jinmu0410 commented 1 year ago
截屏2023-03-27 16 13 04

I am now actively calling sf.close @cerveada

cerveada commented 1 year ago

What do you mean by actively calling ?

jinmu0410 commented 1 year ago

After kafka sends the message, then call SplineRecordSenderFactory.close

cerveada commented 1 year ago

Ok, but then you can't send another message, right?

jinmu0410 commented 1 year ago

No, I changed some codes to unify plan and event into one kafka message, for some special business later

cerveada commented 1 year ago

I found there was an issue in the agent, that cause the Kafka producer being close twice and first time it was too soon.

The issue: #639

This issue was fixed in agent version 1.1.0.