Inaccurate bytecounts - Githubissues

dougburks commented 11 years ago

Hello,

We're running prads as follows:

prads -i eth0 -c $conf -u sguil -g sguil -L
/nsm/sensor_data/$SENSOR/sancp/ -f /nsm/sensor_data/$SENSOR/pads.fifo
-b 'ip or (vlan and ip)'

In Sguil, if I query the sancp table and look at the connections, the source and destination bytes are way too big.

Example:

I do the following: curl testmyids.com

For that connection, Sguil shows the following: S Bytes: 109704 D Bytes: 43697

If I look at the pcap in Wireshark, it's only 1032 bytes total.

Where did 109704 and 43697 come from?

Thanks!

comotion commented 11 years ago

looking at the code, the s/d bytes are computed on the basis of all bytes sent and received including tcp headers. AFAICT the wireshark counts could be just data bytes. Maybe that's what's causing the discrepancy?

dougburks commented 11 years ago

Even if we just look at the lower number, it's 40 times as big as what wireshark is reporting. I don't think tcp headers would be responsible for a 40x increase.

Can you try duplicating my test with "curl testmyids.com"? You'll see that the traffic is nowhere near what prads is reporting.

Thanks!

On Wednesday, November 21, 2012, Kacper Why wrote:

looking at the code, the s/d bytes are computed on the basis of all bytes sent and received including tcp headers. AFAICT the wireshark counts could be just data bytes. Maybe that's what's causing the discrepancy?

— Reply to this email directly or view it on GitHubhttps://github.com/gamelinux/prads/issues/30#issuecomment-10609258.

Doug Burks http://securityonion.blogspot.com

dougburks commented 11 years ago

Good morning PRADS developers!

Have you had a chance to try and duplicate my issue?

Thanks!

dougburks commented 11 years ago

pi->packet_bytes is using pi->ip4->ip_len:

src/prads.c:    pi->packet_bytes = (pi->ip4->ip_len - (IP_HL(pi->ip4) * 4));

Elsewhere in the code, pi->ip4->ip_len is referenced using its ntohs value. For example:

src/sig_tcp.c:            opt_ptr = (uint8_t *) pi->ip4 + ntohs(pi->ip4->ip_len); // fixed from htons
src/sig_tcp.c:            e.size = (ftype == TF_ACK) ? 0 : ntohs(pi->ip4->ip_len);

Should prads.c be using ntohs?

dougburks commented 11 years ago

I added ntohs to prads.c and the numbers seem to make more sense.

Default version:

1354539036000000001|2012-12-03 12:50:36|2012-12-03 12:50:37|1|6|2886759668|37361|3651154719|80|6|108680|5|52893|27|27

Patched with ntohs:

1354538979000000001|2012-12-03 12:49:39|2012-12-03 12:49:39|0|6|2886759668|37360|3651154719|80|6|305|5|363|27|27

Thoughts?

comotion commented 11 years ago

This along with #31 and #33 should maybe close this issue, but needs testing. I've run it through my usual suspects but I'd feel better with moar testing .@dougburks can you test @comotion/master branch, ie pull request #33 and see if things are honky dory?

dougburks commented 11 years ago

Was hoping to test this, but can't until Issue #34 is fixed.

dougburks commented 11 years ago

curl testmyids.com
uid=0(root) gid=0(root) groups=0(root)

Old prads (with ntohs patch): src pkts: 6 src bytes: 309 dst pkts: 4 dst bytes: 347

prads 0.3.2-rc3 src pkts: 6 src bytes: 309 dst pkts: 4 dst bytes: 347

(no difference in this case between 0.3.2-rc3 and previous version with ntohs patch)

wireshark: src pkts: 6 src bytes: 537 dst pkts: 4 dst bytes: 495

The prads numbers are much closer to the Wireshark numbers since adding ntohs, but they still don't seem 100% correct. I believe Wireshark is reporting all bytes including IP and TCP header, but PRADS doesn't include IP header? That would be 20 bytes each for 10 packets or 200 bytes total difference. So if Wireshark is showing 1032 bytes total, I would expect PRADS to show 1032-200=832 bytes. But PRADS is reporting 656 bytes total.

What am I missing?

Thanks!

gamelinux commented 11 years ago

Does any of the packets have tcp options etc? Are all the IP headers 20 bytes? When making cxtracker and prads, the aim was to replace sancp when it comes to how it outputs session (in regards to be used in sguil), which would be better to compare prads to, than wireshark.

dougburks commented 11 years ago

All IP headers in this stream are 20 bytes.

The first two packets have TCP options: Packet 1: 24 bytes of TCP options Packet 2: 8 bytes of TCP options

The remaining eight packets have no TCP options so they are standard 20-byte TCP headers.

Alec Waters and myself have tried many methods but have been unable to understand the numbers. Can you definitively tell us what the prads bytecount is supposed to be measuring? Is it supposed to include IP headers? Is it supposed to include TCP headers?

Thanks!

gamelinux commented 11 years ago

First off, are there any retransmissions also?

The main line for what it counts (for IPv4):

pi->packet_bytes = (ntohs(pi->ip4->ip_len) - (IP_HL(pi->ip4) * 4));

and then:

cxt->x_total_bytes += pi->packet_bytes;

Translated, that is: For each packet in a session, the byte counter is incremented (respectively to if it is the source or destination sending the packet) with the "ip_len - IP_HL", meaning the total length of the IP packet, minus the IP Header length.

So that will include for example the TCP header with its TCP options, and the TCP payload.

In cxtracker, It used to be the length of the payload of UDP or TCP. But when we moved to Indexing when doing full pcap, we use the length of the entire packet as received from libpcap (pcap_pkthdr->len) so it will match directly to where in the raw pcap file you will find the start of a session etc.

Hope this clarifies the issue.

If this is not how you would like it to be, please tell us what standard you would like it to obey. I would guess TCP/UDP payload byte count would be the most accurate, but as prads dont take in account TCP fragmentation and stream-reassembly, this will never be correct :)

Hacky Xmas btw!

comotion commented 11 years ago

I guess we could do raw bytes if this is "more correct"

dougburks commented 11 years ago

I'd be OK with the current standard if I understood exactly what it was and how to reconcile the numerical differences between prads and other tools like Wireshark and Argus. If I can't reconcile the differences, then I'd be concerned that there might be a bug lurking somewhere. I sent a pcap of "curl testmyids.com" (the example used in this thread) to Edward to see if he could help me understand how to reconcile.

When you say "raw bytes", you mean include all of the IP header, TCP/UDP header, and payload, right? I think that might be more user-friendly and intuitive for most analysts, especially when they start comparing to other tools like Wireshark and Argus.

So would it just be changing this:

pi->packet_bytes = (ntohs(pi->ip4->ip_len) - (IP_HL(pi->ip4) * 4));

to this?

pi->packet_bytes = (ntohs(pi->ip4->ip_len);

Thanks!

taosecurity commented 11 years ago

I love seeing PRADS in Sguil and Security Onion, but I share Doug's concerns about inconsistencies compared with tools like Argus and Wireshark. I would really like to see PRADS adopt the same byte counting options that a tool like Argus provides. When there is yet a third version, that doesn't seem very easy to understand, it makes life difficult for analysts. Thank you.

comotion commented 11 years ago

@taosecurity and @dougburks please test it out and let us know

dougburks commented 11 years ago

With the latest patch, were we expecting the numbers to match Argus and Wireshark? I just tested with "pi->packet_bytes = ntohs(pi->ip4->ip_len);" and the numbers are closer, but still don't match. Using the same testmyids.com pcap that I used previously in this issue (and sent to Edward via email), I get the following results: tcpreplay reports it's sending 1032 bytes argus reports 1032 bytes wireshark reports 1032 bytes prads reports 856 bytes

So now it looks like the difference is the ethernet frame header. Would it make sense to include the ethernet frame header in prads so that it matches Argus/Wireshark/tcpreplay?

Thanks!

dougburks commented 11 years ago

As another data point, Bro and NetworkMiner are reporting 856 bytes just like prads. So I'm OK with keeping "pi->packet_bytes = ntohs(pi->ip4->ip_len);" if everybody else is. Thanks!

comotion commented 11 years ago

That's the thing... I've gone with your initial suggestion because our original behavior is arbitrary and so is the byte counting behavior of other tools. Including the ethernet header doesn't make much sense to me: if you are sniffing two sides of a conversation between two different lan segments you will have different byte counts due to encapsulation, vlan differences etc.

however, I don't have particularily strong feelings about this and am happy to put it in line with whatever seems reasonable. We'll keep it like this - bro/networkminer-style and perhaps introduce a new option if enough people want the other kind of counting.

gamelinux / prads

Inaccurate bytecounts #30