File/Lookup processor - Githubissues

shellcromancer commented 1 year ago

Is your feature request related to a problem? Please describe. Today Substation can do enrichment with external data from HTTP responses, managed services like DynamoDB or specific enrichment files like IP Databases but not with generic files such as JSON, CSVs, YAML or other document types. Adding support for these could be useful to query attributes with potentially less overhead than calling external services.

Describe the solution you'd like Add a new processor(s) that support lookups against standard document types mentioned above that allows for lookup up keys in the document with input keys in the data being processed and putting specifying the output to a specified key.

Describe alternatives you've considered We could continue to add narrow processors for specific documents and their use cases like IP Database for access DB's as needed, but a lot of this effort could become redundant

Additional context Similar features in other data transformation tools:

jshlbrd commented 1 year ago

This is a great idea for an enhancement -- combined with our ability to dynamically retrieve files, this basically turns the Internet into a source of threat intelligence.

Here are some notes for whomever decides to work on this:

We should carefully consider the name of this processor -- format-based names like csv and yaml are reserved for conversion processors (like base64, gzip, etc.) and generic names like file may have other uses that could be more fitting for the name (e.g., a file processor could remotely retrieve a file and load the contents into a capsule), but we also don't want to overload a single processor with too many options (which adds complexity)
Lazy loading the file is necessary for performance (similar to the ipdb processor)
Allowing the file to be kept in memory is necessary for performance (similar to the ipdb processor)

jshlbrd commented 1 year ago

This is closed by #66 -- we've taken a different approach than other systems have and abstracted lookup activity into key-value stores (lookups are simply read-only key-value stores). The PR introduces CSV, JSON, and text KV stores, but future ones (like YAML) can be added as needed.

brexhq / substation

File/Lookup processor #65