IBM / sarama

Sarama is a Go library for Apache Kafka.
MIT License
11.57k stars 1.76k forks source link

Sarama panic when enable Net.Proxy #2547

Closed dgnn96 closed 1 year ago

dgnn96 commented 1 year ago
Versions

Please specify real version numbers or git SHAs, not just "Latest" since that changes fairly regularly.

Sarama Kafka Go
1.36.0 1.1.0 1.16
Configuration

What configuration values are you using for Sarama and Kafka?

...
config.Net.Proxy.Enable = true
dialer := &net.Dialer{
        Timeout:   xxx * time.Second,
        KeepAlive: xxx * time.Second,
    }
dialer.Resolver = &net.Resolver{
    PreferGo: true,
    Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
        return dialer.DialContext(ctx, "udp", "xxxx")
    },
}
cfg.Net.Proxy.Dialer = dialer
...
Logs

When filing an issue please provide logs from Sarama and Kafka if at all possible. You can set sarama.Logger to a log.Logger to capture Sarama debug output.

logs: CLICK ME

``` ```

Problem Description

When the broker is Open, the conf.getDialer method is invoked. In the method, Logger.Printf is invoked to print the c.Net.Proxy.Dialer. The net.Dialer.Resolver is read during printing. However, the Resolver contains the singleflight.Group, which contains a field of the map type. This field is read during printing and written during broker connection establishment. Concurrent read and write with map causes panic: fatal error: concurrent map read and map write.

prestona commented 1 year ago

Thanks for the detailed bug report.

Playing around with the example code - it looks like the logger would emit something like:

&{5s 0001-01-01 00:00:00 +0000 UTC <nil> %!s(bool=false) 0s 5s %!s(*net.Resolver=&{true false 0x1043df410 {{0 0} map[]}}) %!s(<-chan struct {}=<nil>) %!s(func(string, string, syscall.RawConn) error=<nil>)}

(Assuming it didn't race into a panic due to concurrent map access).

This doesn't look like particularly valuable debug information to log. I wonder if the problematic logging should be changed to:

  1. Log if the Net.Proxy.Dialer is being used
  2. (Potentially) log the address of the Net.Proxy.Dialer - in case it is useful to distinguish between different dialer's in the log.
dgnn96 commented 1 year ago

Thanks for the detailed bug report.

Playing around with the example code - it looks like the logger would emit something like:

&{5s 0001-01-01 00:00:00 +0000 UTC <nil> %!s(bool=false) 0s 5s %!s(*net.Resolver=&{true false 0x1043df410 {{0 0} map[]}}) %!s(<-chan struct {}=<nil>) %!s(func(string, string, syscall.RawConn) error=<nil>)}

(Assuming it didn't race into a panic due to concurrent map access).

This doesn't look like particularly valuable debug information to log. I wonder if the problematic logging should be changed to:

  1. Log if the Net.Proxy.Dialer is being used
  2. (Potentially) log the address of the Net.Proxy.Dialer - in case it is useful to distinguish between different dialer's in the log.

Usually, the proxy of producer should not be changed frequently. I think user should know whether use proxy and which proxy to be used. If the proxy information must be log in the sarama, I would recommend using the first one.