brexhq / substation

Substation is a toolkit for routing, normalizing, and enriching security event and audit logs.
https://substation.readme.io
MIT License
322 stars 16 forks source link

feat(transform): Add FormatFromZip Transform #221

Closed jshlbrd closed 2 months ago

jshlbrd commented 2 months ago

Description

Motivation and Context

This adds basic support for unarchiving Zip files (mentioned in #219). Most data processing systems don't work on archive files, so this doesn't add a complementary FormatToZip transform (that would require much more design work).

The more important addition in this PR is support for non-text files -- in pre-v1.0 this behavior was configurable using an environment variable, but now it's dynamic based on media (file) type. This could go in two directions in the future:

I'm inclined to keep the existing text support as-is (with decompression) and lean into adding more transforms -- the use cases for reading binary files is limited (most users are working with text files) and recursively unarchiving / decompressing files may become a challenge over time.

How Has This Been Tested?

Types of changes

Checklist: