Closed simon28082 closed 1 year ago
Hi @crcms Can you please test this with https://github.com/Shopify/sarama/pull/2182 ? Thank you!
Line 560 for 1.32.0
does not seem to dereference a brokerProducer
:
https://github.com/Shopify/sarama/blob/9baf7a651a68631e143039fafd7a7bb13ebdceca/async_producer.go#L560
Would you be able to post the panic trace, confirm the version and ideally provide some Sarama logs?
Also what do you mean by During the process, the producer will asynchronously close the producer
?
@slaunay I am just disconnecting the network during the running process.
@crcms if you can share the panic trace that would go a long way.
@slaunay This is a stack of panic.
goroutine 58 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc000218600, 0x1)
vendor/github.com/Shopify/sarama/async_producer.go:618 +0x19d
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc000218600)
vendor/github.com/Shopify/sarama/async_producer.go:562 +0x4f7
github.com/Shopify/sarama.withRecover(0x0)
vendor/github.com/Shopify/sarama/utils.go:43 +0x3e
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
vendor/github.com/Shopify/sarama/async_producer.go:513 +0x1fb
This is a must-have BUG, as long as you interrupt the network in the running process, this problem will occur about 3-5 minutes, and the number of retries set by Producer is the default.
Hi @crcms Can you please test this with #2182 ? Thank you!
Already testing, still has question
Thanks @crcms, the stack points us to line 618
for the nil
derefencing vs line 560
:
https://github.com/Shopify/sarama/blob/9baf7a651a68631e143039fafd7a7bb13ebdceca/async_producer.go#L615-L619
One scenario that would lead to the brokerProducer
being nil
prior to calling newHighWatermark
would be if the following case <-pp.brokerProducer.abandoned
matches:
https://github.com/Shopify/sarama/blob/9baf7a651a68631e143039fafd7a7bb13ebdceca/async_producer.go#L547-L558
We need a unit test case to reproduce the issue in order to implement the proper fix. Such test case can be used as well to confirm whether it still happens with #2182.
@slaunay thank you very much, but the issue #2182 I used it, and it's still problematic
We are having the same issue, any updates on this?
edit: sarama v1.34.0 (go 1.19) with kafka v2.4.1
As this seems to be related, on our production system (sarama v1.36.0, go 1.19; kafka v2.8.0) a pod just crashed with:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x877330]
goroutine 1168 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0x40008e38c0, 0x2)
/go/pkg/mod/github.com/!shopify/sarama@v1.36.0/async_producer.go:620 +0x1e0
github.com/Shopify/sarama.(*partitionProducer).dispatch(0x40008e38c0)
/go/pkg/mod/github.com/!shopify/sarama@v1.36.0/async_producer.go:564 +0x4a4
github.com/Shopify/sarama.withRecover(0xdf5190?)
/go/pkg/mod/github.com/!shopify/sarama@v1.36.0/utils.go:43 +0x40
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
/go/pkg/mod/github.com/!shopify/sarama@v1.36.0/async_producer.go:515 +0x208
Are there any updates on this?
@crcms You mentioned that this is a must have BUG? So is this panic expected behavior if there is a connection issue for some minutes?
As this seems to be related, on our production system (sarama v1.36.0, go 1.19; kafka v2.8.0) a pod just crashed with:
panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x877330] goroutine 1168 [running]: github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0x40008e38c0, 0x2) /go/pkg/mod/github.com/!shopify/sarama@v1.36.0/async_producer.go:620 +0x1e0 github.com/Shopify/sarama.(*partitionProducer).dispatch(0x40008e38c0) /go/pkg/mod/github.com/!shopify/sarama@v1.36.0/async_producer.go:564 +0x4a4 github.com/Shopify/sarama.withRecover(0xdf5190?) /go/pkg/mod/github.com/!shopify/sarama@v1.36.0/utils.go:43 +0x40 created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer /go/pkg/mod/github.com/!shopify/sarama@v1.36.0/async_producer.go:515 +0x208
Are there any updates on this?
@crcms You mentioned that this is a must have BUG? So is this panic expected behavior if there is a connection issue for some minutes?
I face the same issue
got the same issue with sarama@1.37.2 on prod when we lost our kafka:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0xfca3b0]
goroutine 469 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0x40006bc5a0, 0x1)
/home/jenkins/go/pkg/mod/github.com/!shopify/sarama@v1.37.2/async_producer.go:708 +0x1e0
github.com/Shopify/sarama.(*partitionProducer).dispatch(0x40006bc5a0)
/home/jenkins/go/pkg/mod/github.com/!shopify/sarama@v1.37.2/async_producer.go:648 +0x4a4
github.com/Shopify/sarama.withRecover(0x0?)
/home/jenkins/go/pkg/mod/github.com/!shopify/sarama@v1.37.2/utils.go:43 +0x40
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
/home/jenkins/go/pkg/mod/github.com/!shopify/sarama@v1.37.2/async_producer.go:599 +0x208
We also see this segfault when the broker goes away. Sarama version 1.31.1 (2022-02-01). Issue #2322 is a duplicate of this issue #2184, right? Can we mark one or the other as a duplicate?
Dec 10 11:07:19 dut2 OUR: panic: runtime error: invalid memory address or nil pointer dereference
Dec 10 11:07:19 dut2 OUR: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1865bfd]
Dec 10 11:07:19 dut2 OUR: goroutine 131 [running]:
Dec 10 11:07:19 dut2 OUR: github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc001361380, 0x1)
Dec 10 11:07:19 dut2 OUR: /u0/REDACTED/source/telegraf-1.22.1/vendor/github.com/Shopify/sarama/async_producer.go:618 +0x19d
Dec 10 11:07:19 dut2 OUR: github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc001361380)
Dec 10 11:07:19 dut2 OUR: /u0/REDACTED/source/telegraf-1.22.1/vendor/github.com/Shopify/sarama/async_producer.go:562 +0x50a
Dec 10 11:07:19 dut2 OUR: github.com/Shopify/sarama.withRecover(0xc0004a5400?)
Dec 10 11:07:19 dut2 OUR: /u0/REDACTED/source/telegraf-1.22.1/vendor/github.com/Shopify/sarama/utils.go:43 +0x3e
Dec 10 11:07:19 dut2 OUR: created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
Dec 10 11:07:19 dut2 OUR: /u0/REDACTED/source/telegraf-1.22.1/vendor/github.com/Shopify/sarama/async_producer.go:513 +0x1f6
Dec 10 11:07:19 dut2 systemd: OUR.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Just met the issue on sarama v1.29.1, any updates to solve it?
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1ded78f]
goroutine 799 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc0045287e0, 0x1)
/root/go/src/git.garena.com/shopee/seller-server/crm/crm-proactive-push/vendor/github.com/Shopify/sarama/async_producer.go:617 +0x1cf
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc0045287e0)
/root/go/src/git.garena.com/shopee/seller-server/crm/crm-proactive-push/vendor/github.com/Shopify/sarama/async_producer.go:561 +0x589
github.com/Shopify/sarama.withRecover(0xc003fcde00)
/root/go/src/git.garena.com/shopee/seller-server/crm/crm-proactive-push/vendor/github.com/Shopify/sarama/utils.go:43 +0x49
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
/root/go/src/git.garena.com/shopee/seller-server/crm/crm-proactive-push/vendor/github.com/Shopify/sarama/async_producer.go:512 +0x1aa
I have a currently feasible mitigation solution: first set the panic handler, rebuild all producers when any panic is caught, and because the panic handler is set, the producer will not automatically close when the producer internally crashes, and the input channel will be fast is filled and blocked, a producer rebuild is also required when this is detected.
got the same issue: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x12b2609]
goroutine 74332 [running]:
github.com/Shopify/sarama.(*partitionProducer).newHighWatermark(0xc0c18b9140, 0x3)
github.com/Shopify/sarama@v1.36.0/async_producer.go:620 +0x1a9
github.com/Shopify/sarama.(*partitionProducer).dispatch(0xc0c18b9140)
github.com/Shopify/sarama@v1.36.0/async_producer.go:564 +0x537
github.com/Shopify/sarama.withRecover(0xc1ea43e580?)
github.com/Shopify/sarama@v1.36.0/utils.go:43 +0x3e
created by github.com/Shopify/sarama.(*asyncProducer).newPartitionProducer
github.com/Shopify/sarama@v1.36.0/async_producer.go:515 +0x1ea
seems this issue was fixed by:fix: add nil check while producer is retrying (
Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.
Closing as fixed by https://github.com/IBM/sarama/pull/2387
Versions
Problem Description
This is not an inevitable situation, but it happens many times This is a crash triggered by
brokerProducer=nil
Test scenario: A message is sent every second, and the network connection is interrupted during the sending process. During the process, the producer will asynchronously close the producer, and Panic will occur in some cases in about 3-5 minutes Ingithub.com/Shopify/sarama/async_producer.go
560 linesI also took a quick look at the Dispatch method, but couldn't figure out the specific reason This problem, only in 1.32.0, I have tested other versions without this problem