corelight / pycommunityid

A Python implementation of the Community ID flow hashing standard
BSD 3-Clause "New" or "Revised" License
23 stars 10 forks source link

Community id generated using pycommunity id mismatch the one generated using suricata #8

Closed AlyaGomaa closed 1 year ago

AlyaGomaa commented 1 year ago

Issue

I have a pcap, when i run suricata on it, it produces flows with cids when I run zeek on it, and generate the cid of each zeek flow using pycommunityid library, some flows don't have the same cids produced by suricata


Steps to reproduce

here's the pcap i used: https://github.com/stratosphereips/StratosphereLinuxIPS/blob/develop/dataset/test7-malicious.pcap

i ran suricata using the following command on it suricata -r test7-malicious.pcap

i ran zeek using the following cmd on it zeek -C -r test7-malicious.pcap

for each line in the zeek conn.log output i ran the following script to get the cid of each flow

proto = flow.proto.lower()
cases = {
    'tcp': communityid.FlowTuple.make_tcp,
    'udp': communityid.FlowTuple.make_udp,
    'icmp': communityid.FlowTuple.make_icmp,
}
try:
    tpl = cases[proto](flow.saddr, flow.daddr, flow.sport, flow.dport)
    return self.community_id.calc(tpl)
except KeyError:
    return ''

now for example this flow produced by suricata:

{"timestamp": "2018-03-09T22:49:16.520001+0200", "flow_id": 1898491295854895, "event_type": "flow", "src_ip": "fe80:0000:0000:0000:00d2:4591:568e:c3d1", "src_port": 5353, "dest_ip": "ff02:0000:0000:0000:0000:0000:0000:00fb", "dest_port": 5353, "proto": "UDP", "app_proto": "failed", "flow": {"pkts_toserver": 13, "pkts_toclient": 0, "bytes_toserver": 5188, "bytes_toclient": 0, "start": "2018-03-09T22:49:16.553263+0200", "end": "2018-03-09T22:50:26.234272+0200", "age": 70, "state": "new", "reason": "timeout", "alerted": false}, "community_id": "1:JpepHprmBz0RFdlLGhEMO4jAPvA="}

is the same as this flow produced by zeek:

conn.log:{"ts":1520628556.553263,"uid":"CJwrIjmGopvQP6Gx1","id.orig_h":"fe80::d2:4591:568e:c3d1","id.orig_p":5353,"id.resp_h":"ff02::fb","id.resp_p":5353,"proto":"udp","service":"dns","duration":14.121544122695923,"orig_bytes":1892,"resp_bytes":0,"conn_state":"S0","local_orig":false,"local_resp":false,"missed_bytes":0,"history":"D","orig_pkts":7,"orig_ip_bytes":2228,"resp_pkts":0,"resp_ip_bytes":0,"orig_l2_addr":"68:5b:35:b1:55:93","resp_l2_addr":"33:33:00:00:00:fb"}

however, pycommunity id gives me this cid: 1:Ij3wBn8AhEgwlNMz41h3vXi0yL8= which doesn't match the one produced by suricata for the same flow


update

when I tried generating the cid using zeek's corelight plugin Corelight/CommunityID, I got the same uid as pycommunityid library

{"ts":1520628556.553263,"uid":"C0ADPg3q0T5H6xlzdb","id.orig_h":"fe80::d2:4591:568e:c3d1","id.orig_p":5353,"id.resp_h":"ff02::fb","id.resp_p":5353,"proto":"udp","service":"dns","duration":14.121544122695923,"orig_bytes":1892,"resp_bytes":0,"conn_state":"S0","local_orig":false,"local_resp":false,"missed_bytes":0,"history":"D","orig_pkts":7,"orig_ip_bytes":2228,"resp_pkts":0,"resp_ip_bytes":0,"orig_l2_addr":"68:5b:35:b1:55:93","resp_l2_addr":"33:33:00:00:00:fb","community_id":"1:Ij3wBn8AhEgwlNMz41h3vXi0yL8="}

i guess this means that suricata is the one doing something wrong, and not pycommunityid?

awelzel commented 1 year ago

@AlyaGomaa , very nice catch!

i guess this means that suricata is the one doing something wrong, and not pycommunityid?

It looks like, yes. I moved it over to the Suricata project: https://redmine.openinfosecfoundation.org/issues/6276 https://github.com/OISF/suricata/pull/9399/files

(I don't have permissions to close the issue - maybe you could yourself?)

AlyaGomaa commented 1 year ago

Hey thanks for your help! will close.

ckreibich commented 1 year ago

Thanks again for this from me too — very helpful. Your finding made me realize an omission in the test data over in https://github.com/corelight/community-id-spec — the test traces have only individual flows (for example just one for IPv6 traffic), so they don't fully cover the endpoint-flipping logic if you only report per-flow Community IDs. I remember briefly comparing the Zeek and Suricata implementations with those traces and, seeing that they matched, moving on. But that was very incomplete...

AlyaGomaa commented 1 year ago

hey @ckreibich we've all been there, glad i could help!