NetSys / bess

BESS: Berkeley Extensible Software Switch
Other
313 stars 156 forks source link

Possible packet corruption in VPort #906

Open ezrasilvera opened 5 years ago

ezrasilvera commented 5 years ago

Reproduce flow:

  1. Create two containers
  2. Create two Vports in the containers and connect between them
  3. Run some traffic between the containers
  4. Kill bessd
  5. Delete the containers
  6. start bessd
  7. Repeat Steps (1) - (3)

==> Fail in napi_alloc_skb due to total_len very large or negative.
(e.g. bess - sn_host_do_rx_batch():351 skb alloc (-64022B) failed) This happen consistently (also with the latest code)

Possible issue I'm not sure, but it seems that length=-64022 is a result of do_tx_batch overwrite the buffer. It set the length to 1514 (05EA ) and because rx_desc.total_len is 32 bits and tx_desc.total_len is only 16 bits we get a length of FFFF 05EA (-64022). (Seems like the same buffer is placed/used simultanously in two rings)

fabricioufmt commented 5 years ago

Hi,

I have the same problem when I use iperf application. Even I do not kill bessd.

It happens only using VPort, PMDPort does not.

ezrasilvera commented 5 years ago

@fabricioufmt You are correct. I'm also getting the same issue with just iperf. It's just that with restarting bessd I can consistently reproduce it on each run BTW, I also added signatures to the descriptors of all packets when they enter the different queues (in user space and kernel driver) and I could definitely see that my signature are overwritten - while the packet supposed to be in the queue ...

fabricioufmt commented 5 years ago

Also, after some executions, I got kernel panic error with memory leak message. VPort

ezrasilvera commented 5 years ago

Correct. This is all due to the same issue - the packet is overwritten. This cause for example the segment pointer to get some garbage value, causing to either crash on the VPort side or kernel oops on the driver side.

fabricioufmt commented 5 years ago

Any news?

ezrasilvera commented 5 years ago

I gave up on this, We will switch to use pmd-tap or kni ...

fabricioufmt commented 5 years ago

But, are you using pmd-tap or kni with containers?