Closed ikholodkov closed 9 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
what do you have in the extra of your connection?
@hussein-awala something like this { "bootstrap.servers": "server1:9093,server2:9093", "group.id": "test-consumer-group", "auto.offset.reset": "latest", "ssl.endpoint.identification.algorithm": "https", "sasl.username": "user", "sasl.password": "password", "sasl.mechanism": "SCRAM-SHA-256", "security.protocol": "SASL_PLAINTEXT" }
I asked to check if you set the group id.
I tried to set consumers count less, equal and more than partitions but every time the error happened.
When you tried setting consumers less, how many consumers did you use and how many partition do you have in the topic? Do you have multiple running triggerers or just a single one?
I have 3 partitions in the topic and tested this with 1,2,3,4 dagruns. Initially I wanted to check #31803 fix with one dugrun, further I checked ability to using more than one dagrun.
Do you have multiple running triggerers or just a single one?
I've tried with only one instance of Triggerer and only one DAG with multiple triggerer dagruns.
I'll try to reproduce the issue. Which version of confluent-kafka
and Kafka (in the cluster) are you using?
confluent-kafka was installed while installing apache-airflow-providers-apache-kafka==1.1.2 and it's 2.2.0 Kafka cluster version is 3.2.x
Airflow 2.7.2, apache-airflow-providers-apache-kafka==1.2.0. Unfortunately the problem is still relevant.
I can confirm that this is still happening.
name = "apache-airflow"
version = "2.7.2"
name = "apache-airflow-providers-apache-kafka"
version = "1.2.0"
name = "confluent-kafka"
version = "2.3.0"
I also have 3 partitions, single consumer group, I define group.id in the connection definition. I can trigger this bug consistently by pushing new message from a console - @hussein-awala would some debug logs help to find out the root cause?
(The Consume and Produce operators work well from the provider package.)
I can confirm that this is still happening.
name = "apache-airflow" version = "2.7.2" name = "apache-airflow-providers-apache-kafka" version = "1.2.0" name = "confluent-kafka" version = "2.3.0"
I also have 3 partitions, single consumer group, I define group.id in the connection definition. I can trigger this bug consistently by pushing new message from a console - @hussein-awala would some debug logs help to find out the root cause?
(The Consume and Produce operators work well from the provider package.)
I will try to reproduce the issue with the provided versions
I am also experiencing the same problem with the same provider version and Airflow 2.7.2 and using apache-airflow-providers-apache-kafka==1.2.0. I have test it with on my local Dockers env. and on a on-prem VM deployment with Ubuntu 20.04
@ikholodkov @ddione84 I just created #36272 to fix the issue. Could you please test it?
Also, I suggest testing the current version of the operator with config enable.auto.commit: false
; As you see in my PR, we commit the consumed messages manually, so if the auto-commit is enabled, the consumer will try to commit the consumed offsets periodically, which may be the reason for your issue (the automatic commit doesn't find any offset to commit).
@ikholodkov @ddione84 I just created #36272 to fix the issue. Could you please test it?
Also, I suggest testing the current version of the operator with config
enable.auto.commit: false
; As you see in my PR, we commit the consumed messages manually, so if the auto-commit is enabled, the consumer will try to commit the consumed offsets periodically, which may be the reason for your issue (the automatic commit doesn't find any offset to commit).
Hi have tested this for my pipelines and it's seems to be working fine. thank you
Hello, I can confirm that it fixes the issue. Thanks @hussein-awala .
I can also trigger the error again by:
"enable.auto.offset.store": false
to the consumer connection configWould it make sense to add another integration test to cover "enable.auto.offset.store": false
? I can make a pr.
Hello, I can confirm that it fixes the issue. Thanks @hussein-awala .
I can also trigger the error again by:
- reverting the fix
- adding
"enable.auto.offset.store": false
to the consumer connection configWould it make sense to add another integration test to cover
"enable.auto.offset.store": false
? I can make a pr.
Feel free to open a PR. Here you can find some examples for the integration tests and how we setup the Kafka configurations.
Apache Airflow version
2.6.3
What happened
While trying to use AwaitMessageTriggerFunctionSensor i'm increasing count of dagruns. I've encountered an exception
cimpl.KafkaException: KafkaError{code=_NO_OFFSET,val=-168,str="Commit failed: Local: No offset stored"}
. I tried to set consumers count less, equal and more than partitions but every time the error happened. Here is a log:What you think should happen instead
Sensor should get a message without errors. Each message should be committed once.
How to reproduce
Example of a DAG:
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
apache-airflow-providers-apache-kafka==1.1.2
Deployment
Other 3rd-party Helm chart
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct