aregm / nff-go

NFF-Go -Network Function Framework for GO (former YANFF)
BSD 3-Clause "New" or "Revised" License
1.38k stars 156 forks source link

panic in rte_eth_tx_burst - how to manage thread safety? #725

Open mikebromwich opened 3 years ago

mikebromwich commented 3 years ago

Hi,

I am using nff-go with the netvsc DPDK driver (for Hyper-V) with two ports. In order to respond to ARP and ICMP requests, I am using DealARPICMP.

It appears that if an ARP response (which is sent in handleARPICMPRequests using answerPacket.SendPacket) cooincides with an outgoing packet being sent by the flow graph, this causes a panic (SIGSEGV) in rte_eth_tx_burst.

I've read various articles (e.g. http://mails.dpdk.org/archives/dev/2014-January/001077.html) that rte_eth_tx_burst is not thread safe (using the same port and queue). Also, the Intel documentation says...

'If multiple threads are to use the same hardware queue on the same NIC port, then locking, or some other form of mutual exclusion, is necessary.'

How can I avoid this crash and coordinate the calls to rte_eth_tx_burst between nff_go_send and directSend?

I can synchronize the calls to directSend by using my own implementation of DealARPICMP - but seemingly can't avoid collisions with nff_go_send.

Thanks,

Mike

Edited to add relevant stack trace:

[signal SIGSEGV: segmentation violation code=0x1 addr=0xc pc=0xa25660]

runtime stack: runtime.throw(0xc19c64, 0x2a) /usr/local/go/src/runtime/panic.go:1117 +0x72 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:718 +0x2e5

goroutine 37 [syscall, locked to thread]: runtime.cgocall(0x863a30, 0xc000317928, 0xc000317938) /usr/local/go/src/runtime/cgocall.go:154 +0x5b fp=0xc0003178f8 sp=0xc0003178c0 pc=0x4dfd9b github.com/intel-go/nff-go/internal/low._Cfunc_directSend(0x12d0a9fc0, 0x12d0a0000, 0x0) _cgo_gotypes.go:572 +0x45 fp=0xc000317928 sp=0xc0003178f8 pc=0x7e19a5 github.com/intel-go/nff-go/internal/low.DirectSend.func1(0x12d0a9fc0, 0x0, 0xc00031a170) /home/mike/upf/nff-go/internal/low/low.go:95 +0x57 fp=0xc000317958 sp=0xc000317928 pc=0x7e4ed7 github.com/intel-go/nff-go/internal/low.DirectSend(0x12d0a9fc0, 0x9ed806524e5d0000, 0x6f4ea8c0dd6193f3) /home/mike/upf/nff-go/internal/low/low.go:95 +0x35 fp=0xc000317980 sp=0xc000317958 pc=0x7e2b15 github.com/intel-go/nff-go/packet.(Packet).SendPacket(...) /home/mike/upf/nff-go/packet/packet.go:848 main.handleARP(0x1170b34ce, 0xc00021e108, 0x1e00a10) /home/mike/upf/main.go:114 +0x237 fp=0xc0003179f8 sp=0xc000317980 pc=0x85d757 main.handleCorePacket(0x1170b3440, 0xc87c90, 0xc00021e108, 0x3c0000003f) /home/mike/upf/main.go:194 +0x115 fp=0xc000317a20 sp=0xc0003179f8 pc=0x85dd75 github.com/intel-go/nff-go/flow.separate(0x1170b3440, 0xc000226310, 0xc87c90, 0xc00021e108, 0x3) /home/mike/upf/nff-go/flow/flow.go:1796 +0x48 fp=0xc000317a50 sp=0xc000317a20 pc=0x7f1408 github.com/intel-go/nff-go/flow.segmentProcess(0xb7b720, 0xc0002045a0, 0xc000184140, 0x11, 0x11, 0xc0001a0120, 0xc0001a0180, 0xc0001a8600, 0xc000310000, 0x3, ...) /home/mike/upf/nff-go/flow/flow.go:1466 +0x4d9 fp=0xc000317ef0 sp=0xc000317a50 pc=0x7f01f9 github.com/intel-go/nff-go/flow.(instance).startNewClone.func1(0xc000228780, 0x5, 0xc00018e900) /home/mike/upf/nff-go/flow/scheduler.go:289 +0x25e fp=0xc000317fc8 sp=0xc000317ef0 pc=0x7f77be runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc000317fd0 sp=0xc000317fc8 pc=0x5485e1 created by github.com/intel-go/nff-go/flow.(*instance).startNewClone /home/mike/upf/nff-go/flow/scheduler.go:283 +0x2c5

mikebromwich commented 3 years ago

I've temporarily worked around this by creating two Generator flow functions and connecting them into the ICMP and ARP processing via channels. However, this has tied-up two more cores - and required duplication of the ARP code within the framework - so I'd appreciate any better solutions anybody can suggest.

Thanks,

Mike