childe / healer

golang lib for kafka
22 stars 18 forks source link

leaderBroker 为nil,程序直接panic #6

Closed sundy-li closed 5 years ago

sundy-li commented 5 years ago
E1225 06:04:37.649676   59809 simple_consumer.go:251] fetch error:read tcp4 10.69.12.20:57914->10.68.33.103:7560: use of closed network connection
E1225 06:04:47.126021   59809 simple_consumer.go:88] could not create broker 1004. maybe should refresh metadata.
E1225 06:04:47.126073   59809 simple_consumer.go:142] get leader broker error: could not init broker for node[1004](10.68.33.107:7560):failed to establish connection when init broker: dial tcp4 10.68.33.107:7560: connect: connection timed out
E1225 06:04:48.150021   59809 simple_consumer.go:88] could not create broker 1008. maybe should refresh metadata.
E1225 06:04:48.150127   59809 simple_consumer.go:88] could not create broker 1007. maybe should refresh metadata.
E1225 06:04:48.150021   59809 simple_consumer.go:88] could not create broker 1010. maybe should refresh metadata.
E1225 06:05:03.509951   59809 simple_consumer.go:88] could not create broker 1008. maybe should refresh metadata.
E1225 06:05:03.509952   59809 simple_consumer.go:88] could not create broker 1007. maybe should refresh metadata.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x54 pc=0x732b72]

goroutine 122178916 [running]:
github.com/childe/healer.(*Broker).Close(0x0)
    /Users/sundy/pan/gopath/src/github.com/childe/healer/broker.go:68 +0x22
github.com/childe/healer.(*SimpleConsumer).Consume.func2(0xc45caeca20, 0xc508b145b0, 0xc447345f80)
    /Users/sundy/pan/gopath/src/github.com/childe/healer/simple_consumer.go:302 +0xa3d
created by github.com/childe/healer.(*SimpleConsumer).Consume
    /Users/sundy/pan/gopath/src/github.com/childe/healer/simple_consumer.go:230 +0xad0
childe commented 5 years ago

这个应该是已经fix的. e92a91f7383239d73859dbb03ff280f2f274eb8a

jasper-zhang commented 5 years ago

佳爷,我这边用的是更新后的版本,还是有这种问题呢?

E0428 13:25:36.262599   48771 simple_consumer.go:88] could not create broker 3. maybe should refresh metadata.
E0428 13:25:36.262605   48771 simple_consumer.go:88] could not create broker 3. maybe should refresh metadata.
E0428 13:25:36.262611   48771 simple_consumer.go:284] get leader broker of [cdn_access_log/37] error: could not get broker info with nodeID[3]
E0428 13:25:36.262742   48771 group_consumer.go:400] failed to send heartbeat:The group is rebalancing, so a rejoin is needed.
E0428 13:25:36.262924   48771 simple_consumer.go:88] could not create broker 3. maybe should refresh metadata.
E0428 13:25:36.262934   48771 simple_consumer.go:88] could not create broker 3. maybe should refresh metadata.
E0428 13:25:36.262940   48771 simple_consumer.go:88] could not create broker 3. maybe should refresh metadata.
E0428 13:25:36.262948   48771 simple_consumer.go:284] get leader broker of [cdn_access_log/37] error: could not get broker info with nodeID[3]
E0428 13:25:36.278865   48767 group_consumer.go:400] failed to send heartbeat:The group is rebalancing, so a rejoin is needed.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x54 pc=0x78ae82]

goroutine 43575539 [running]:
github.com/childe/healer.(*Broker).Close(0x0)
        /home/etl/gopath/src/github.com/childe/healer/broker.go:68 +0x22
github.com/childe/healer.(*SimpleConsumer).Consume.func2(0xc015da2120, 0xc00c536330, 0xc005d4c2a0)
        /home/etl/gopath/src/github.com/childe/healer/simple_consumer.go:310 +0xa34
created by github.com/childe/healer.(*SimpleConsumer).Consume
        /home/etl/gopath/src/github.com/childe/healer/simple_consumer.go:238 +0x660
childe commented 5 years ago

@jasper-zhang 哪个版本? 我看行号对应不到是哪一个commit.

childe commented 5 years ago

不过大概看到怎么回事了. 这个BUG没有能完全修复.

sundy-li 遇到的问题是刚刚创建simple consumer的时候没有拿到Leader Broker. 你遇到的问题是消费过程中出现这个错误, "This server is not the leader for that topic-partition", 然后尝试重新设置 leader失败, 把leaer设置成了 nil , 结果调用 nil.Close()

childe commented 5 years ago

一直想写个完整的测试, 起一个干净的kafka集群, 然后把各种情况测一下. 这样改了程序也能安心一点, 到现在也没写好 .. :(

childe commented 5 years ago

50dcbe896c9f3426a12c53dafab92d5b114ec0dd 这个空指针的BUG应该是修复了. 也添加了referesh meta. 看起来你这边是新加了一个节点进来, 不能根据这个新的leader nodeID创建broker, 现在应该可以处理了.