Open rware opened 6 years ago
Could you confirm if this still is a lingering issue? I tried to reproduce this issue locally (a couple of times last year, and today), but I have been not able to.
I vaguely remember seeing a similar issue, and the cause was the kernel module and the BESS daemon were out of sync; the problem was gone once I rebuilt the module from scratch.
Yes and no. I'm not sure what was going on in my original issue, but I did notice I get a segfault and BESS dies when I set the queue size to be something other than a power of 2. Instead of getting an error that queue size shouldn't be a power of 2, I get a segfault and the BESS daemon dies. I've reinstalled BESS on a new machine with a fresh install of Ubuntu, so I don't think things being out of sync is the issue.
Here's an example config I used to reproduce this error:
import os
bess.add_worker(wid=1, core=1)
bess.add_tc('rl', policy='rate_limit', wid=1, resource='bit', limit={'bit': int(1e6)})
os.system('docker pull ubuntu > /dev/null')
os.system('docker run -i -d --net=none --name=vport_test ubuntu /bin/bash > /dev/null')
# Alice lives outside the container, wanting to talk to Bob in the container
v_bob = VPort(ifname='eth_bob', docker='vport_test', ip_addrs=['10.255.99.2/24'])
v_alice = VPort(ifname='eth_alice', ip_addrs=['10.255.99.1/24'])
PortInc(port=v_alice) -> q::Queue(size=10) -> PortOut(port=v_bob)
PortInc(port=v_bob) -> PortOut(port=v_alice)
q.attach_task(parent='rl')
And here's the output when I run:
*** Error: Unhandled exception in the configuration script (most recent call last)
File "/opt/bess/bessctl/conf/test.bess", line 13, in <module>
PortInc(port=v_alice) -> q::Queue(size=10) -> PortOut(port=v_bob)
File "/opt/bess/bessctl/commands.py", line 132, in __bess_module__
return make_modules([module_names])[0]
File "/opt/bess/bessctl/commands.py", line 112, in make_modules
obj = mclass_obj(*args, name=module, **kwargs)
File "/opt/bess/bessctl/../pybess/module.py", line 58, in __init__
self.choose_arg(None, kwargs))
File "/opt/bess/bessctl/../pybess/bess.py", line 425, in create_module
return self._request('CreateModule', request)
File "/opt/bess/bessctl/../pybess/bess.py", line 258, in _request
raise self.Error(code, errmsg, query=name, query_arg=req_dict)
*** Error: RPC failed to localhost:10514 - <_Rendezvous of RPC that terminated with (StatusCode.UNKNOWN, Stream removed)>
From /tmp/bessd_crash.log (Wed Jan 10 15:31:44 2018):
A critical error has occured. Aborting...
Signal: 11 (Segmentation fault), si_code: 2 (SEGV_ACCERR: invalid permissions for mapped object)
pid: 23368, tid: 23387, address: 0x7f9ff00003f8, IP: 0x7f9ff00003f8
Backtrace (recent calls first) ---
(0): [0x7f9ff00003f8]
Command failed: run test
And a "pgrep bessd" shows the BESS daemon isn't running anymore.
Hi, I'm facing with the same problem when I try the examples posted on https://github.com/NetSys/bess/wiki/Hooking-up-BESS-Ports#connecting-vms-and-containers-with-bess-vports. The logs are shown below. Could anyone help me to solve this problem?
1 Log file created at: 2018/12/19 14:48:04
2 Running on machine:
3 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
4 F1219 14:48:04.838188 9890 debug.cc:405] A critical error has occured. Aborting...
5 Signal: 11 (Segmentation fault), si_code: 1 (SEGV_MAPERR: address not mapped to object)
6 pid: 9869, tid: 9890, address: 0x140, IP: 0x563a9e5eea4e
7 Backtrace (recent calls first) ---
8 (0): /home/???/bess/core/bessd(_ZN5VPort11RecvPacketsEhPPN4bess6PacketEi+0x19e) [0x563a9e5eea4e]
9 VPort::RecvPackets(unsigned char, bess::Packet**, int) at /home/???/bess/core/drivers/vport.cc:620
10 617: pkt = pkts[i] = bess::Packet::from_paddr(paddr[i]);
11 618:
12 619: tx_desc = pkt->scratchpad<struct sn_tx_desc *>();
13 -> 620: len = tx_desc->total_len;
14 621:
15 622: pkt->set_data_off(SNBUF_HEADROOM);
16 623: pkt->set_total_len(len);
17 (1): /home/???/bess/core/bessd(_ZN7PortInc7RunTaskEP7ContextPN4bess11PacketBatchEPv+0xeb) [0x563a9e67fe1b]
18 PortInc::RunTask(Context*, bess::PacketBatch*, void*) at /home/???/bess/core/modules/port_inc.cc:121
19 -> 121: batch->set_cnt(p->RecvPackets(qid, batch->pkts(), burst));
20 (2): /home/???/bess/core/bessd(_ZNK4TaskclEP7Context+0x53) [0x563a9e495f23]
21 Task::operator()(Context*) const at /home/???/bess/core/task.cc:53
22 -> 53: struct task_result result = module_->RunTask(ctx, &init_batch, arg_);
23 (3): /home/???/bess/core/bessd(_ZN4bess16DefaultScheduler12ScheduleLoopEv+0x1cc) [0x563a9e4d178c]
24 bess::DefaultScheduler::ScheduleLoop() at /home/???/bess/core/scheduler.h:272
25 -> 272: auto ret = (*ctx->task)(ctx);
26 (inlined by) bess::DefaultScheduler::ScheduleLoop() at /home/???/bess/core/scheduler.h:250
27 -> 250: ScheduleOnce(&ctx);
28 (4): /home/???/bess/core/bessd(_ZN6Worker3RunEPv+0x20d) [0x563a9e4ceb7d]
29 Worker::Run(void*) at /home/???/bess/core/worker.cc:316
30 -> 316: scheduler_->ScheduleLoop();
31 (5): /home/???/bess/core/bessd(_Z10run_workerPv+0x7c) [0x563a9e4cee4c]
32 run_worker(void*) at /home/???/bess/core/worker.cc:330
33 -> 330: return current_worker.Run(_arg);
34 (6): /home/???/bess/core/bessd(+0xc82a6e) [0x563a9eda6a6e]
35 execute_native_thread_routine at thread.o:?
36 (7): /lib/x86_64-linux-gnu/libpthread.so.0(+0x76da) [0x7f8709f686da]
37 start_thread at /build/glibc-OTsEL5/glibc-2.27/nptl/pthread_create.c:463
38 (file/line not available)
39 (8): /lib/x86_64-linux-gnu/libc.so.6(clone+0x3e) [0x7f87094d788e]
40 clone at /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
41 (file/line not available)
I've got the following BESS config file (where $NIC_PCI is a NIC port connected to DPDK):
I'm running the config file as follows:
/opt/bess/bessctl/bessctl daemon start -- run [BESS_CONF_FILE_NAME] [ENV_VARS]
After running this BESS config file ~5-10 times any time I try to run after that the bess daemon will die with the following stack trace in /tmp/bessd_crash.log:
At first the run command worked fine but ~30s later the daemon would die with the segfault error above.
Recently it started dying as soon as I try to run this config file with this error: