bitkeks / python-netflow-v9-softflowd

PyPI "netflow" package. NetFlow v9 parser, collector and analyzer implemented in Python 3. Developed and tested with softflowd
https://bitkeks.eu/blog/2016/08/collecting-netflow-v9-on-openwrt.html
MIT License
116 stars 59 forks source link

FIRST_SWITCHED and LAST_SWITCHED keys are missing in parsed packet #26

Open aumisb opened 4 years ago

aumisb commented 4 years ago

I have softflowd (softflowd-1.0.0) running in my pfsense box with "Flow Tracking Level" set to Full and the "Netflow Version" set to 9. When I use nfcapd to capture packets and inspect them using nfdump, I see expected results. An example flow record is shown below.

Flow Record: 
  Flags        =              0x06 FLOW, Unsampled
  label        =            <none>
  export sysid =                 1
  size         =                80
  first        =        1587416220 [2020-04-20 16:57:00]
  last         =        1587416220 [2020-04-20 16:57:00]
  msec_first   =               557
  msec_last    =               711
  src addr     =     HIDDEN_WAN_IP
  dst addr     =           1.1.1.1
  src port     =             12118
  dst port     =               853
  fwd status   =                 0
  tcp flags    =              0x1b ...AP.SF
  proto        =                 6 TCP
  (src)tos     =                 0
  (in)packets  =                12
  (in)bytes    =              1044
  input        =                 1
  output       =                 1
  ip router    =       192.168.1.1
  engine type  =                 0
  engine ID    =                 0
  received at  =     1587416521844 [2020-04-20 17:02:01.844]

However, when running the collector and analyzer with the same softflowd settings, I am getting an error:

$ python3 -m netflow.analyzer -f 1587416506.gz
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/siam/Projects/python-netflow/venv/lib/python3.8/site-packages/netflow/analyzer.py", line 261, in <module>
    for flow in sorted(flows, key=lambda x: x["FIRST_SWITCHED"]):
  File "/home/siam/Projects/python-netflow/venv/lib/python3.8/site-packages/netflow/analyzer.py", line 261, in <lambda>
    for flow in sorted(flows, key=lambda x: x["FIRST_SWITCHED"]):
KeyError: 'FIRST_SWITCHED'

Inspecting an element in the flows list in analyzer.py, the collected flows are missing keys (see below). The UNKNOWN_FIELD_TYPE may be one of either FIRST_SWITCHED or LAST_SWITCHED

{'INPUT_SNMP': 1, 'IN_BYTES': 1480, 'IN_PKTS': 9, 'IPV4_DST_ADDR': '199.197.246.60', 'IPV4_SRC_ADDR': 'WAN_IP', 'IP_PROTOCOL_VERSION': 4, 'L4_DST_PORT': 443, 'L4_SRC_PORT': 28453, 'NF_F_FLOW_CREATE_TIME_MSEC': 1587416629854, 'OUTPUT_SNMP': 1, 'PROTOCOL': 6, 'SRC_TOS': 0, 'TCP_FLAGS': 26, 'UNKNOWN_FIELD_TYPE': 1587416630141}

Since nfcapd is capturing the FIRST_SWITCHED and LAST_SWITCHED fields and this library isn't, could there be an issue with parsing somewhere? I have not debugged with a raw hex dump, but can if you want me to.

bitkeks commented 4 years ago

Hello @aumisb, thanks for the bug report! This seems to be a similar case as in #17, where FIRST_SWITCHED and LAST_SWITCHED also were the causes of errors. Guess we'll have to remove the fields.

I have not debugged with a raw hex dump, but can if you want me to.

This would greatly improve the debugging! It should be fairly simple to check what field types (integers) are used for the values, but are not resolved to *_SWITCHED. If you could find out what the keys for first and last are in your example above, that would really help!

As a side note, I just saw that the usage of UNKNOWN_FIELD_TYPE (Reference) is wrong. As soon as more than one field type is not recognized, the default fallback key UNKNOWN_FIELD_TYPE would be overwritten, dropping the previous value. This should be fixed.