artefactual-labs / auditmatica

Audit Archivematica user activities via Nginx access logs
GNU Affero General Public License v3.0
0 stars 0 forks source link

Add auditmatica code #1

Closed tw4l closed 3 years ago

tw4l commented 3 years ago

Connected to https://github.com/archivematica/Issues/issues/1341

This PR adds a Python auditmatica package that, as its primary function, generates Common Event Format (CEF) event logs from an Archivematica nginx access log. The main functions that do this work are auditmatica.access_log.parse_access_log_line (which parses an nginx access log line into a formatted dictionary), auditmatica.access_log.add_event_info (which compares the parsed line to Archivematica and Storage Service event mappings in auditmatica.access_log.events and annotates the parsed line with event information) and auditmatica.cef.write_cef_event (which writes the CEF event string).

These are in turn used by a CLI, written using Click, that provides a simple user interface: auditmatica write-cef [options] [LOG]. The resulting CEF logs can be written to stdout (default), a file, or syslog.

Storage Service log lines are distinguished from those of Archivematica through use of the --ss-base-url option.

I also put much more minimal effort into a second CLI subcommand, auditmatica overview [options] [LOG]. This is intended as a demonstration of how the package could be used to gain insight into the use of an Archivematica instance.

Edit: Adding that the Archivematica and Storage Service event mappings are incomplete. I tried to focus on the events that seeemed the most important from a security perspective (e.g. accessing stored data, adding or editing users, and editing configuration details). This Google doc provides some context into which events are (and are not yet) included: https://docs.google.com/document/d/1ufo9rlH7Gff9hvWcBYrxo0UDZ0wOmS9GlbD6U3Ll1AI/edit?usp=sharing. Right now that's only accessible to Artefactual staff. I'd be interested in your thoughts @ross-spencer on whether a distilled version of that document should live in this repo (maybe as a CSV?) so we can track progress moving forward.

tw4l commented 3 years ago

Hi @ross-spencer - I believe this is now ready for re-review! I have addressed your inline comments, reworked the README, added an IMPLEMENTATION.md (nice suggestion!), and added a Makefile with commands to help publish the package to PyPI. Among the changes, the overview command now outputs JSON, which I'm really enjoying.

A few specific comments I wanted to address:

Is there a preferred default argument that should be used without requiring a flag? I like the idea of that for just getting going - cat | auditmatica > output_happens_here

I see what you mean, but the downside would be that running just auditmatica would no longer display the help menu, with the list of available subcommands. So I'd rather leave it as-is if you're okay with it.

Cool that you've used type hinting in places too. I'd have been really interested to see the outcome if it is used more globally, is there a rationale for not using it more widely?

We talked about this a bit offline, but the inconsistent usage is because Python type expects you to specify the types of keys and values for dicts, which would require a bit of an overhaul at this stage that I don't think would bring additional value besides making the type annotations happy. So I used type annotations for the simpler types like str and bool but not for dicts.

tw4l commented 3 years ago

One more note - I'd like to publish this to PyPI but am just waiting for code review to be complete before doing so :)