FDio / govpp

Go toolset for the VPP.
Apache License 2.0
199 stars 82 forks source link

Investigate intermittent failure of TestTraceEnabled #108

Closed ondrej-fabry closed 1 year ago

ondrej-fabry commented 1 year ago

The test case TestTraceEnabled seems to be failing at random occurrences.

Here is a log output from recent failure that occurred in CI Workflow:

2023-02-20T20:59:37.3608367Z === RUN   TestTraceEnabled
2023-02-20T20:59:37.3609424Z     trace_test.go:46: 
2023-02-20T20:59:37.3610020Z         Expected
2023-02-20T20:59:37.3610822Z             <[]*api.Record | len:7, cap:8>: [
2023-02-20T20:59:37.3611331Z                 {
2023-02-20T20:59:37.3612307Z                     Message: <*interfaces.CreateLoopback | 0xc00225222a>{
2023-02-20T20:59:37.3613298Z                         MacAddress: [0, 0, 0, 0, 0, 0],
2023-02-20T20:59:37.3613754Z                     },
2023-02-20T20:59:37.3614748Z                     Timestamp: 2023-02-20T20:59:37.355402739Z,
2023-02-20T20:59:37.3615331Z                     IsReceived: false,
2023-02-20T20:59:37.3615852Z                     ChannelID: 1,
2023-02-20T20:59:37.3616265Z                 },
2023-02-20T20:59:37.3616619Z                 {
2023-02-20T20:59:37.3617622Z                     Message: <*interfaces.CreateLoopbackReply | 0xc002252298>{Retval: 0, SwIfIndex: 0},
2023-02-20T20:59:37.3618529Z                     Timestamp: 2023-02-20T20:59:37.35542454Z,
2023-02-20T20:59:37.3619089Z                     IsReceived: true,
2023-02-20T20:59:37.3619599Z                     ChannelID: 1,
2023-02-20T20:59:37.3620004Z                 },
2023-02-20T20:59:37.3620379Z                 {
2023-02-20T20:59:37.3621083Z                     Message: <*memif.MemifCreate | 0xc00224f200>{
2023-02-20T20:59:37.3621654Z                         Role: 0,
2023-02-20T20:59:37.3622160Z                         Mode: 0,
2023-02-20T20:59:37.3622694Z                         RxQueues: 0,
2023-02-20T20:59:37.3623227Z                         TxQueues: 0,
2023-02-20T20:59:37.3623704Z                         ID: 0,
2023-02-20T20:59:37.3624226Z                         SocketID: 0,
2023-02-20T20:59:37.3624901Z                         RingSize: 0,
2023-02-20T20:59:37.3625461Z                         BufferSize: 0,
2023-02-20T20:59:37.3626050Z                         NoZeroCopy: false,
2023-02-20T20:59:37.3626722Z                         HwAddr: [0, 0, 0, 0, 0, 0],
2023-02-20T20:59:37.3628285Z                         Secret: "",
2023-02-20T20:59:37.3628768Z                     },
2023-02-20T20:59:37.3629619Z                     Timestamp: 2023-02-20T20:59:37.355516946Z,
2023-02-20T20:59:37.3630204Z                     IsReceived: false,
2023-02-20T20:59:37.3630740Z                     ChannelID: 1,
2023-02-20T20:59:37.3631145Z                 },
2023-02-20T20:59:37.3631521Z                 {
2023-02-20T20:59:37.3632439Z                     Message: <*memif.MemifCreateReply | 0xc0022522c8>{Retval: 0, SwIfIndex: 0},
2023-02-20T20:59:37.3633317Z                     Timestamp: 2023-02-20T20:59:37.355526447Z,
2023-02-20T20:59:37.3633888Z                     IsReceived: true,
2023-02-20T20:59:37.3634395Z                     ChannelID: 1,
2023-02-20T20:59:37.3634778Z                 },
2023-02-20T20:59:37.3635156Z                 {
2023-02-20T20:59:37.3641241Z                     Message: <*l2.BridgeDomainAddDel | 0xc001c56f00>{BdID: 0, Flood: false, UuFlood: false, Forward: false, Learn: false, ArpTerm: false, ArpUfwd: false, MacAge: 0, BdTag: "", IsAdd: false},
2023-02-20T20:59:37.3642204Z                     Timestamp: 2023-02-20T20:59:37.355608052Z,
2023-02-20T20:59:37.3643344Z                     IsReceived: false,
2023-02-20T20:59:37.3643806Z                     ChannelID: 1,
2023-02-20T20:59:37.3644106Z                 },
2023-02-20T20:59:37.3644392Z                 {
2023-02-20T20:59:37.3645178Z                     Message: <*l2.BridgeDomainAddDelReply | 0xc00225230c>{Retval: 0},
2023-02-20T20:59:37.3645972Z                     Timestamp: 2023-02-20T20:59:37.355629753Z,
2023-02-20T20:59:37.3646441Z                     IsReceived: true,
2023-02-20T20:59:37.3646861Z                     ChannelID: 1,
2023-02-20T20:59:37.3647167Z                 },
2023-02-20T20:59:37.3647450Z                 {
2023-02-20T20:59:37.3648179Z                     Message: <*ip.IPTableAddDelReply | 0xc00225233c>{Retval: 0},
2023-02-20T20:59:37.3648928Z                     Timestamp: 2023-02-20T20:59:37.35574826Z,
2023-02-20T20:59:37.3649402Z                     IsReceived: true,
2023-02-20T20:59:37.3649818Z                     ChannelID: 1,
2023-02-20T20:59:37.3650108Z                 },
2023-02-20T20:59:37.3650378Z             ]
2023-02-20T20:59:37.3650679Z         to have length 8
2023-02-20T20:59:37.3651028Z --- FAIL: TestTraceEnabled (0.00s)
2023-02-20T20:59:37.3651445Z FAIL core.TestTraceEnabled (0.00s)
2023-02-20T20:59:37.3656156Z PASS core.TestMultiRequestTraceEnabled (0.00s)
2023-02-20T20:59:37.3656938Z PASS core.TestTraceDisabled (0.00s)
2023-02-20T20:59:37.3678066Z PASS core.TestTracePerChannel (0.00s)
2023-02-20T20:59:37.3757584Z PASS core.TestTraceClear (0.00s)
2023-02-20T20:59:37.3845354Z FAIL core
2023-02-20T20:59:37.3864616Z EMPTY core/genericpool
2023-02-20T20:59:37.3903496Z EMPTY examples/api-trace
2023-02-20T20:59:37.3904257Z EMPTY examples/binapi-types
2023-02-20T20:59:37.3904922Z EMPTY examples/multi-vpp
2023-02-20T20:59:37.3905468Z EMPTY examples/perf-bench
2023-02-20T20:59:37.3906066Z EMPTY examples/rpc-service
2023-02-20T20:59:37.3906646Z EMPTY examples/simple-client
2023-02-20T20:59:37.3907253Z EMPTY examples/stats-client
2023-02-20T20:59:37.3907797Z EMPTY examples/stream-client
2023-02-20T20:59:37.3908346Z EMPTY proxy
2023-02-20T20:59:37.3908855Z EMPTY test/vpptesting
2023-02-20T20:59:37.3909472Z EMPTY test/vpptesting/vpplauncher
2023-02-20T20:59:37.3909966Z EMPTY version
2023-02-20T20:59:37.4423300Z 
2023-02-20T20:59:37.4423989Z === Skipped
2023-02-20T20:59:37.4424657Z === SKIP: core TestReceiveReplyAfterTimeoutMultiRequest (0.00s)
2023-02-20T20:59:37.4425487Z     channel_test.go:435: 
2023-02-20T20:59:37.4425663Z 
2023-02-20T20:59:37.4425825Z === Failed
2023-02-20T20:59:37.4426222Z === FAIL: core TestTraceEnabled (0.00s)
2023-02-20T20:59:37.4426574Z     trace_test.go:46: 
2023-02-20T20:59:37.4426868Z         Expected
2023-02-20T20:59:37.4427301Z             <[]*api.Record | len:7, cap:8>: [
2023-02-20T20:59:37.4427622Z                 {
2023-02-20T20:59:37.4428322Z                     Message: <*interfaces.CreateLoopback | 0xc00225222a>{
2023-02-20T20:59:37.4428993Z                         MacAddress: [0, 0, 0, 0, 0, 0],
2023-02-20T20:59:37.4429358Z                     },
2023-02-20T20:59:37.4430046Z                     Timestamp: 2023-02-20T20:59:37.355402739Z,
2023-02-20T20:59:37.4430526Z                     IsReceived: false,
2023-02-20T20:59:37.4430952Z                     ChannelID: 1,
2023-02-20T20:59:37.4431249Z                 },
2023-02-20T20:59:37.4431541Z                 {
2023-02-20T20:59:37.4432440Z                     Message: <*interfaces.CreateLoopbackReply | 0xc002252298>{Retval: 0, SwIfIndex: 0},
2023-02-20T20:59:37.4433241Z                     Timestamp: 2023-02-20T20:59:37.35542454Z,
2023-02-20T20:59:37.4433715Z                     IsReceived: true,
2023-02-20T20:59:37.4434137Z                     ChannelID: 1,
2023-02-20T20:59:37.4434435Z                 },
2023-02-20T20:59:37.4434722Z                 {
2023-02-20T20:59:37.4435356Z                     Message: <*memif.MemifCreate | 0xc00224f200>{
2023-02-20T20:59:37.4435836Z                         Role: 0,
2023-02-20T20:59:37.4437305Z                         Mode: 0,
2023-02-20T20:59:37.4437780Z                         RxQueues: 0,
2023-02-20T20:59:37.4438251Z                         TxQueues: 0,
2023-02-20T20:59:37.4438710Z                         ID: 0,
2023-02-20T20:59:37.4439162Z                         SocketID: 0,
2023-02-20T20:59:37.4439613Z                         RingSize: 0,
2023-02-20T20:59:37.4440068Z                         BufferSize: 0,
2023-02-20T20:59:37.4440604Z                         NoZeroCopy: false,
2023-02-20T20:59:37.4441174Z                         HwAddr: [0, 0, 0, 0, 0, 0],
2023-02-20T20:59:37.4441620Z                         Secret: "",
2023-02-20T20:59:37.4441960Z                     },
2023-02-20T20:59:37.4442677Z                     Timestamp: 2023-02-20T20:59:37.355516946Z,
2023-02-20T20:59:37.4443149Z                     IsReceived: false,
2023-02-20T20:59:37.4443584Z                     ChannelID: 1,
2023-02-20T20:59:37.4443899Z                 },
2023-02-20T20:59:37.4444177Z                 {
2023-02-20T20:59:37.4445178Z                     Message: <*memif.MemifCreateReply | 0xc0022522c8>{Retval: 0, SwIfIndex: 0},
2023-02-20T20:59:37.4445986Z                     Timestamp: 2023-02-20T20:59:37.355526447Z,
2023-02-20T20:59:37.4446463Z                     IsReceived: true,
2023-02-20T20:59:37.4446878Z                     ChannelID: 1,
2023-02-20T20:59:37.4447193Z                 },
2023-02-20T20:59:37.4447483Z                 {
2023-02-20T20:59:37.4449035Z                     Message: <*l2.BridgeDomainAddDel | 0xc001c56f00>{BdID: 0, Flood: false, UuFlood: false, Forward: false, Learn: false, ArpTerm: false, ArpUfwd: false, MacAge: 0, BdTag: "", IsAdd: false},
2023-02-20T20:59:37.4449958Z                     Timestamp: 2023-02-20T20:59:37.355608052Z,
2023-02-20T20:59:37.4450431Z                     IsReceived: false,
2023-02-20T20:59:37.4450862Z                     ChannelID: 1,
2023-02-20T20:59:37.4451159Z                 },
2023-02-20T20:59:37.4451442Z                 {
2023-02-20T20:59:37.4452215Z                     Message: <*l2.BridgeDomainAddDelReply | 0xc00225230c>{Retval: 0},
2023-02-20T20:59:37.4452980Z                     Timestamp: 2023-02-20T20:59:37.355629753Z,
2023-02-20T20:59:37.4453451Z                     IsReceived: true,
2023-02-20T20:59:37.4453876Z                     ChannelID: 1,
2023-02-20T20:59:37.4454171Z                 },
2023-02-20T20:59:37.4454455Z                 {
2023-02-20T20:59:37.4455290Z                     Message: <*ip.IPTableAddDelReply | 0xc00225233c>{Retval: 0},
2023-02-20T20:59:37.4456044Z                     Timestamp: 2023-02-20T20:59:37.35574826Z,
2023-02-20T20:59:37.4456520Z                     IsReceived: true,
2023-02-20T20:59:37.4456943Z                     ChannelID: 1,
2023-02-20T20:59:37.4457242Z                 },
2023-02-20T20:59:37.4457512Z             ]
2023-02-20T20:59:37.4457827Z         to have length 8
2023-02-20T20:59:37.4457998Z 
2023-02-20T20:59:37.4458167Z DONE 90 tests, 1 skipped, 1 failure in 37.514s
2023-02-20T20:59:37.4467380Z ##[error]Process completed with exit code 1.
ondrej-fabry commented 1 year ago

Another random failure popped up, the failing test is in this case TestTraceClear:

=== Failed
=== FAIL: core TestTraceClear (0.00s)
    trace_test.go:263: 
        Expected
            <[]*api.Record | len:3, cap:3>: [
                {
                    Message: <*interfaces.CreateLoopback | 0xc0020b470a>{
                        MacAddress: [0, 0, 0, 0, 0, 0],
                    },
                    Timestamp: 2023-02-23T09:45:44.068699612Z,
                    IsReceived: false,
                    ChannelID: 1,
                },
                {
                    Message: <*interfaces.CreateLoopbackReply | 0xc0020b4758>{Retval: 0, SwIfIndex: 0},
                    Timestamp: 2023-02-23T09:45:44.068726512Z,
                    IsReceived: true,
                    ChannelID: 1,
                },
                {
                    Message: <*memif.MemifCreateReply | 0xc0020b4788>{Retval: 0, SwIfIndex: 0},
                    Timestamp: 2023-02-23T09:45:44.068879412Z,
                    IsReceived: true,
                    ChannelID: 1,
                },
            ]
        to have length 4

DONE 90 tests, 1 skipped, 1 failure in 48.670s
Error: Process completed with exit code 1.
VladoLavor commented 1 year ago

@ondrej-fabry the problem here is that trace records are processed from several goroutines, and if any of them stumble and the GetRecord is called fast enough, some records might not make it to the list. The test then fails, because it expects exact numbers.

This could happen to any trace test. It's enough to run it like 100k times in a loop and it eventually fails.

I reimplemented the trace here #116 to mitigate the race and fasten up the API tracing process. If you don't like the change, it can be also fixed with a single 1ms sleep line.