Closed GoogleCodeExporter closed 9 years ago
Run drone directly and attach the console log here. Also provide output of the
following commands -
ifconfig ethX (for all 5 ports)
cat /proc/net/dev
Original comment by pstav...@gmail.com
on 2 Nov 2012 at 3:45
sorry, was busy....attached you will find the requested output for case 1 (eth4
affected)
Original comment by loox...@googlemail.com
on 26 Nov 2012 at 1:01
Attachments:
ok...I just did some quick "debugging" without actually understanding the
mechanics, but I think I know where the problem is:
LinuxPort::StatsMonitor::netlinkStats()
The buffer size for the parsed netlink messages is determined by peeking into
the first message. over here, that message contains information for the
interfaces lo,eth0,eth3, resulting in a buffersize of 3020 byte (996 + 2*1012).
Inside the _retry loop, the message for these interfaces is retrieved and
parsed, the while(NLMSG_OK(nlm, (uint)len)) is finished since len is 0. Next
call from recvmsg returns information for interfaces eth1,eth2,eth4. However,
this message is BIGGER than the first one (3036 = 3*1012), the buffer not big
enough, len too small, thus after parsing eth1 and eth2, the last record for
eth4 has a size of 1012 byte, but len is only 996 byte, thus the NLMSG_OK
condition in the while loop is false, eth4 is not processed.
so it looks like the problem occurs due to the fact that the netlink messages
for both group of ports differs in size, using the smaller one as metric.
I hope it got at least a little bit clear what I meant, forgive me, daring to
debug your code without actually understanding it (: just want to help (:
Original comment by loox...@googlemail.com
on 26 Nov 2012 at 5:19
a solution might be to move:
count = 0;
_retry:
right above:
// Find required size of buffer and resize accordingly
while (1)
so that for every new netlink msg the buffer size gets adjusted according to
the peek into the message, but thats just an idea
Original comment by loox...@googlemail.com
on 27 Nov 2012 at 10:45
@looxrat: I guess you hit the nail on the head. Thanks for debugging. For a
multipart netlink message, peeking won't help because you can't peek more than
the first message. So will possibly have to send and receive twice - once to
get the buffer size and subsequently to create the port list with the actual
data.
Till those changes are made, I recommend the following quick hack -
In LinuxPort::StatsMonitor::netlinkStats(), change the following default buffer
size from 1024 to 8192 or 16384 -
buf.fill('\0', 1024);
Let me know if that fixes the issue for now.
Original comment by pstav...@gmail.com
on 27 Nov 2012 at 3:54
yes, increasing the initial buffer size works perfectly, even with 19 interface
(the 5 physical ones, the loopback and a bunch of vlans).
Regardin the peeking I have to admit I dont quite understand it yet, forgive me
(: I take it, the moment you do the recvmsg() inside the _retry loop the
message is removed, thus, if the wile(1) loop (which is currently outside the
_retry loop) gets executed for every message first (put into the _retry loop),
wouldn't that always determine the size of the next message to be recv?
just curious! Anyway, good work, really nice program (:
Original comment by loox...@googlemail.com
on 28 Nov 2012 at 8:23
Issue 94 has been merged into this issue.
Original comment by pstav...@gmail.com
on 13 Jan 2013 at 5:11
@looxrat: on revisiting the code, I see that your suggested fix about moving
the _retry loop will work. Will fix shortly.
Original comment by pstav...@gmail.com
on 13 Jan 2013 at 5:13
revision a95d85838d53 fixes this issue
Original comment by pstav...@gmail.com
on 16 Jan 2013 at 4:36
I had same issue ver 0.5.1 on ubuntu. 2 out of 8 ports shown "unknown" and
cannot pass traffic.
Previous discussion here said quick fix is to increase buffer size...
In LinuxPort::StatsMonitor::netlinkStats(), change the following default buffer
size from 1024 to 8192 or 16384 -
buf.fill('\0', 1024);
Can someone tell me where I can change the buf.fill setting? which file? Or I
have to wait for a newer Ostinato version?
Thanks.
Original comment by weylwa...@gmail.com
on 30 Sep 2013 at 6:55
@weylwang7: The file is server/linuxport.cpp
The buf.fill change was only a workaround. The actual fix is already committed.
You can get the latest code from the repository and build. See the Wiki for
more details on how to do that
Original comment by pstav...@gmail.com
on 2 Oct 2013 at 3:24
Original issue reported on code.google.com by
loox...@googlemail.com
on 1 Nov 2012 at 10:19Attachments: