fox-it / dissect.target

The Dissect module tying all other Dissect modules together. It provides a programming API and command line tools which allow easy access to various data sources inside disk images or file collections (a.k.a. targets).
GNU Affero General Public License v3.0
43 stars 44 forks source link

Inconsistent field types #723

Open JSCU-CNI opened 3 months ago

JSCU-CNI commented 3 months ago

We are working on sanitizing and unifying our Elasticsearch mappings for dissect flow records before open sourcing them. In doing so we found the following record descriptors in dissect.target which have conflicting type definitions.

Currently we force these conflicting type mappings to elastic wildcards to ensure the fields are still indexed by Elasticsearch (we do not use dynamic mode). This however prevents analysts to use type specific queries (e.g. x < y on fields which are expected to be an integer or float).

We propose to map record types with the same name and conflicting types to a sane common field type where possible. Would such a PR be welcome, or do you propose a different solution?

The conflicting fields we found are summarized below:

1. different mappings for 'priority':
   dissect/target/plugins/os/unix/linux/debian/dpkg.py  DpkgPackageStatusRecord         ("string", "priority")
   dissect/target/plugins/os/unix/log/journal.py        JournalRecord               ("varint", "priority")
   dissect/target/plugins/os/windows/activitiescache.py ActivitiesCacheRecord           ("uint32", "priority")
   dissect/target/plugins/os/windows/tasks.py       TaskRecord              ("string", "priority")

1. different mappings for 'id':
   dissect/target/plugins/apps/browser/browser.py       GENERIC_DOWNLOAD_RECORD_FIELDS      ("varint", "id")
   dissect/target/plugins/apps/browser/browser.py       GENERIC_EXTENSION_RECORD_FIELDS     ("string", "id")
   dissect/target/plugins/apps/browser/browser.py       GENERIC_HISTORY_RECORD_FIELDS       ("string", "id")
   dissect/target/plugins/apps/browser/browser.py       GENERIC_PASSWORD_RECORD_FIELDS      ("varint", "id")
   dissect/target/plugins/apps/browser/browser.py       ActivitiesCacheRecord           ("bytes", "id")
   dissect/target/plugins/os/windows/locale.py      WindowsKeyboardRecord           ("string", "id")
   dissect/target/plugins/os/windows/notifications.py   AppDBTileRecord             ("varint", "id")
   dissect/target/plugins/os/windows/notifications.py   AppDBToastRecord            ("varint", "id")
   dissect/target/plugins/os/windows/notifications.py   WpnDatabaseNotificationRecord       ("varint", "id")
   dissect/target/plugins/os/windows/notifications.py   WpnDatabaseNotificationHandlerRecord    ("varint", "id")

1. 'value' is string everywhere, except:
   dissect/target/plugins/os/windows/regf/regf.py       RegistryValueRecord         ("dynamic", "value")
   dissect/target/plugins/os/windows/regf/trusteddocs.py    TrustedDocumentsRecord          ("bytes", "value")

1. 'filesize' is filesize everywhere, except:
   dissect/target/plugins/os/windows/log/amcache.py COMMON_ELEMENTS             ("string", "filesize")

1. 'local_ip' is net.ipaddress everywere, except:
   dissect/target/plugins/os/unix/linux/sockets.py      NetSocketRecord             ("string", "local_ip")

1. 'remote_ip' is net.ipaddress everywere, except:
   dissect/target/plugins/os/unix/linux/sockets.py      NetSocketRecord             ("string", "remote_ip")

1. 'pid' is integer (varint/uint32) everywhere, except:
   dissect/target/plugins/os/windows/regf/usb.py        UsbRegistryRecord           ("string", "pid")

1. different mappings for 'scan_id':
   dissect/target/plugins/apps/av/symantec.py       SEPLogRecord                ("varint", "scan_id")
   dissect/target/plugins/os/windows/defender.py        DefenderQuarantineRecord        ("bytes", "scan_id")
   dissect/target/plugins/os/windows/defender.py        DefenderFileQuarantineRecord        ("bytes", "scan_id")

1. different mappings for 'quarantine_id':
   dissect/target/plugins/apps/av/symantec.py       SEPLogRecord                ("varint", "quarantine_id")
   dissect/target/plugins/os/windows/defender.py        DefenderQuarantineRecord        ("bytes", "quarantine_id")
   dissect/target/plugins/os/windows/defender.py        DefenderFileQuarantineRecord        ("bytes", "quarantine_id")

1. 'start_time' is datetime everywhere, except:
   dissect/target/plugins/os/windows/sru.py     VfuRecord               ("varint", "start_time")

1. 'end_time' is datetime everywhere, except:
   dissect/target/plugins/os/windows/sru.py     VfuRecord               ("varint", "end_time")
cecinestpasunepipe commented 3 months ago

Thank you for bringing this matter to our attention. You are right, these field types are inconsistent. We will review this issue with our team and provide you with an update following our discussions.

JSCU-CNI commented 3 months ago

Thanks. On a side note, what is up with @DissectBot editing comments lately? :sweat_smile:

Miauwkeru commented 3 months ago

We propose to map record types with the same name and conflicting types to a sane common field type where possible. Would such a PR be welcome, or do you propose a different solution?

The field names of a record don't necessarily have the same type. As a field name in one plugin can have a complete different meaning from another. So that cannot be enforced across the whole of dissect. In the case a field is incorrectly represented by its type, it should be changed.

Feel free to fix any actual inconsistencies, but beware that types should not be forced onto fieldnames. We realize this may leave some fields to be post-processed on your end in elasticsearch.