NYU-HSRN-Network-Data-Science-Group / AutoZeekWatch

An online, deployable machine learning network intrusion detection system for Zeek.
MIT License
3 stars 0 forks source link

Error catching and inputing missing data #6

Closed diego-lopez8 closed 4 months ago

diego-lopez8 commented 4 months ago

We want to use history and duration fields, however some flows seem to not generate these. Examples are ICMP pings that do not generate history (ping google.com).

Here is a stack trace

2024-02-07 23:41:02,435 - root - INFO - Using logdir: /opt/homebrew/var/logs (train.py)
2024-02-07 23:41:02,435 - root - INFO - Checking /opt/homebrew/var/logs/2024-02-07 (train.py)
2024-02-07 23:41:02,435 - root - INFO - Opening file /opt/homebrew/var/logs/2024-02-07/conn.22:00:00-23:00:00.log.gz (train.py)
Traceback (most recent call last):
  File "/Users/diego/Projects/NIDS/NIDS/train.py", line 99, in <module>
    main()
  File "/Users/diego/Projects/NIDS/NIDS/train.py", line 89, in main
    np_arr = preprocess_json(json_data_file)
  File "/Users/diego/Projects/NIDS/NIDS/utils.py", line 45, in preprocess_json
    data_list.append([log_entry[feature] for feature in features])
  File "/Users/diego/Projects/NIDS/NIDS/utils.py", line 45, in <listcomp>
    data_list.append([log_entry[feature] for feature in features])
KeyError: 'duration'

and the problematic flow in the line variable before failing

No duration

{'ts': 1707361205.666825, 'uid': 'C8tF8t4HmFbh3WH7Z2', 'id.orig_h': '192.168.0.219', 'id.orig_p': 60268, 'id.resp_h': '10.32.254.10', 'id.resp_p': 3000, 'proto': 'tcp', 'conn_state': 'S0', 'local_orig': True, 'local_resp': True, 'missed_bytes': 0, 'history': 'S', 'orig_pkts': 1, 'orig_ip_bytes': 64, 'resp_pkts': 0, 'resp_ip_bytes': 0}

No history

{'ts': 1707360984.491481, 'uid': 'CJx8Y3FGSd4OUQ4K5', 'id.orig_h': 'fe80::18e0:7402:e452:9837', 'id.orig_p': 135, 'id.resp_h': 'fe80::1880:cf13:2945:39f', 'id.resp_p': 136, 'proto': 'icmp', 'duration': 160.99706101417542, 'orig_bytes': 96, 'resp_bytes': 64, 'conn_state': 'OTH', 'local_orig': False, 'local_resp': False, 'missed_bytes': 0, 'orig_pkts': 4, 'orig_ip_bytes': 288, 'resp_pkts': 4, 'resp_ip_bytes': 256}

Please implement some way to either catch the json KeyError and add the variable to the line, or implement some logic to add these in after.

https://github.com/zoe70416/NIDS/blob/c26a772667d97931d8b6d56acf73745e5db78daa/NIDS/utils.py#L38

olive-jy-song commented 4 months ago

(hsrn-nids) jiayuansong@soukagens-Air-8 NIDS % python train.py --log-dir /usr/local/logs 2024-02-09 15:36:19,179 - root - INFO - Using logdir: /usr/local/logs (train.py) 2024-02-09 15:36:19,179 - root - INFO - Checking /usr/local/logs/2024-02-09 (train.py) 2024-02-09 15:36:19,180 - root - INFO - Opening file /usr/local/logs/2024-02-09/conn.15:08:52-15:08:54.log.gz (train.py) Traceback (most recent call last): File "/Users/jiayuansong/Desktop/NYUDS/HSRN/repo/NIDS/NIDS/train.py", line 99, in main() File "/Users/jiayuansong/Desktop/NYUDS/HSRN/repo/NIDS/NIDS/train.py", line 89, in main np_arr = preprocess_json(json_data_file) File "/Users/jiayuansong/Desktop/NYUDS/HSRN/repo/NIDS/NIDS/utils.py", line 45, in preprocess_json log_entry = json.loads(line.strip()) File "/opt/anaconda3/envs/hsrn-nids/lib/python3.9/json/init.py", line 346, in loads return _default_decoder.decode(s) File "/opt/anaconda3/envs/hsrn-nids/lib/python3.9/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/opt/anaconda3/envs/hsrn-nids/lib/python3.9/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

diego-lopez8 commented 4 months ago

completed by #15