containerd / go-cni

A generic CNI library to provide APIs for CNI plugin interactions
Apache License 2.0
146 stars 56 forks source link

cni: fix data-race on lazy init by ensureExec(). #82

Closed fuweid closed 2 years ago

fuweid commented 2 years ago

Before this patch, the package doesn't init CNIConfig.exec field. It is lazy init by CNIConfig.ensureExec() during AddNetworkList. After enhancement [1] , there are always more than two goroutines(one for loopback and other one for eth0) to call ensureExec(). Go runtime doesn't ensure that value-assignment is atomic, and then it is possible to use nil and panic, like [2].

Programs that modify data being simultaneously accessed by multiple goroutines must serialize such access.

from https://golang.org/ref/mem Advice section.

I think it should be fixed in github.com/containernetworking/cni library but we need to delivery containerd release 1.6 version in time. I would like to init CNIConfig like what the ensureExec() [3] does instead of lazy init.


How to reproduce it:

package main

import "sync"

type Exec interface {
    FindInPath(v int)
}

type rawExec struct {
    v int
}

func (r *rawExec) FindInPath(v int) {
    r.v += v
}

type cniConfig struct {
    exec Exec
}

func (c *cniConfig) addNetwork(v int) {
    if c.exec == nil {
        c.exec = &rawExec{}
    }
    c.exec.FindInPath(v)
}

func main() {
    c := &cniConfig{}

    for i := 0; i < 100000; i++ {
        c.exec = nil // reset

        var wg sync.WaitGroup
        for j := 0; j <= 10; j++ {
            wg.Add(1)
            go func(k int) {
                defer wg.Done()

                c.addNetwork(k)
            }(j)
        }
        wg.Wait()
    }
}

And run it

➜  /tmp go version
go version go1.17.3 linux/amd64

➜  /tmp go run main.go
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x455a40]

goroutine 264932 [running]:
main.(*rawExec).FindInPath(0x0, 0x0)
    /tmp/main.go:14
main.(*cniConfig).addNetwork(...)
    /tmp/main.go:25
main.main.func1(0x0)
    /tmp/main.go:40 +0xb6
created by main.main
    /tmp/main.go:37 +0xb8
exit status 2
➜  /tmp

Root Cause:

The panic meesage is main.(*rawExec).FindInPath(0x0, 0x0) which means that the interface exec var has the underly type rawExec but its value is zero FindInPath(0x0, like what [2] mentioned.

// go tool compile -S ./main.go
//
"".(*cniConfig).addNetwork STEXT size=148 args=0x10 locals=0x20 funcid=0x0
    ...
    ...

    # (AX) is cniConfig address -> checkout c.exec == nil
        0x0014 00020 (./main.go:22)     CMPQ    (AX), $0
        0x0018 00024 (./main.go:22)     JNE     95

    # if c.exec == nil
    #
    # Since the go interface is struct iface
    #
    # https://github.com/golang/go/blob/release-branch.go1.17/src/runtime/runtime2.go#L202
    #
    # type iface struct {
        #   tab  *itab
        #   data unsafe.Pointer
    # }
    #
    # The value-assignment will be two instructions(one for tab and
    # the other one for data.
        0x001a 00026 (./main.go:21)     MOVQ    AX, "".c+40(SP)
        0x001f 00031 (./main.go:25)     MOVQ    BX, ""..autotmp_4+16(SP)
    #
    # AX will be rawExec type information
    #
    #       type."".rawExec SRODATA size=120
    #       ...
        #   rel 24+8 t=1 runtime.memequal64·f+0
        #   rel 32+8 t=1 runtime.gcbits.+0
        #   rel 40+4 t=5 type..namedata.*main.rawExec-+0
        #   rel 44+4 t=5 type.*"".rawExec+0
        #   rel 48+8 t=1 type..importpath."".+0
        #   rel 56+8 t=1 type."".rawExec+96
        #   rel 80+4 t=5 type..importpath."".+0
        #   rel 96+8 t=1 type..namedata.v-+0
        #   rel 104+8 t=1 type.int+0
    #
        0x0024 00036 (./main.go:23)     LEAQ    type."".rawExec(SB), AX
        0x002b 00043 (./main.go:23)     PCDATA  $1, $0
        0x002b 00043 (./main.go:23)     CALL    runtime.newobject(SB)
    #
    # CX will be itab and get FindInPath TEXT address.
    #
    #   go.itab.*"".rawExec,"".Exec SRODATA dupok size=32
    #       ...
        #   rel 0+8 t=1 type."".Exec+0
        #   rel 8+8 t=1 type.*"".rawExec+0
        #   rel 24+8 t=-32767 "".(*rawExec).FindInPath+0
    #
        0x0030 00048 (./main.go:23)     LEAQ    go.itab.*"".rawExec,"".Exec(SB), CX
        0x0037 00055 (./main.go:23)     MOVQ    "".c+40(SP), DX
        0x003c 00060 (./main.go:23)     MOVQ    CX, (DX)
        0x003f 00063 (./main.go:23)     PCDATA  $0, $-2
        0x003f 00063 (./main.go:23)     CMPL    runtime.writeBarrier(SB), $0
        0x0046 00070 (./main.go:23)     JNE     78
        0x0048 00072 (./main.go:23)     MOVQ    AX, 8(DX)
        0x004c 00076 (./main.go:23)     JMP     87
        0x004e 00078 (./main.go:23)     LEAQ    8(DX), DI
        0x0052 00082 (./main.go:23)     CALL    runtime.gcWriteBarrier(SB)
        0x0057 00087 (./main.go:25)     PCDATA  $0, $-1
        0x0057 00087 (./main.go:25)     MOVQ    DX, AX
        0x005a 00090 (./main.go:25)     MOVQ    ""..autotmp_4+16(SP), BX

    # if c.exec != nil and try to call FindInPath
    #
    # (AX) is cniConfig
    #
        0x005f 00095 (./main.go:25)     MOVQ    (AX), CX
    #
    # 8(AX) is rawExec address (receiver)
    #
        0x0062 00098 (./main.go:25)     MOVQ    8(AX), AX
    #
    # 24(CX) is the FindInPath TEXT address.
    #
        0x0066 00102 (./main.go:25)     MOVQ    24(CX), CX
        0x006a 00106 (./main.go:25)     PCDATA  $1, $1
        0x006a 00106 (./main.go:25)     CALL    CX
        0x006c 00108 (./main.go:26)     MOVQ    24(SP), BP
        0x0071 00113 (./main.go:26)     ADDQ    $32, SP
        0x0075 00117 (./main.go:26)     RET
    ...
    ...

Since there are multiple goroutines to check c.exec == nil, if the lazy-init only assigns iface.tab and the FindInPath TEXT address, interface's value is still nil and the FindInPath will use the interface's value as method receiver. It will panic if try to read the value from receiver.

Reference:

[1] https://github.com/containerd/go-cni/pull/76 [2] https://github.com/moby/buildkit/pull/2589#issuecomment-1032012405 [3] https://github.com/containernetworking/cni/blob/1694fd7b57e0176a98a12823a5ffc03337fdc152/libcni/api.go#L182

Fixes: #76

Signed-off-by: Wei Fu fuweid89@gmail.com

codecov-commenter commented 2 years ago

Codecov Report

Merging #82 (0e59b3d) into main (06e386d) will increase coverage by 0.84%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #82      +/-   ##
==========================================
+ Coverage   44.24%   45.08%   +0.84%     
==========================================
  Files           9        9              
  Lines         391      397       +6     
==========================================
+ Hits          173      179       +6     
  Misses        184      184              
  Partials       34       34              
Impacted Files Coverage Δ
cni.go 72.72% <100.00%> (+1.42%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 06e386d...0e59b3d. Read the comment docs.

mikebrow commented 2 years ago

Thanks @fuweid great catch