cisagov / cyhy-system

Cyber Hygiene system and overall documentation/issue tracking
Creative Commons Zero v1.0 Universal
6 stars 0 forks source link

Store IP geolocation data in a dedicated database collection #123

Open mcdonnnj opened 4 months ago

mcdonnnj commented 4 months ago

💡 Summary

I believe it would be beneficial to store IP geolocation information in its own collection in the database.

Motivation and context

We currently rely on a MaxMind GeoIP2 database being available anywhere we use cisagov/cyhy-core to provide IP geolocation data as needed. This data is pulled from when adding hosts (with cyhy-ip or cyhy-tool) or when manually updating host geolocation data (with cyhy-geoip). This results in both inconsistent data depending on how old the GeoIP2 database is for whoever is running the command as well as stale data as the data from the IP was added is not updated unless specifically requested.

If we store IP geolocation data in its own collection we can pull from that collection when that data is needed and since it is centralized we can update it on a regular cadence inline with the source updates (MaxMind updates the databases every Tuesday and Friday).

Implementation notes

A new collection schema will need to be added to cisagov/cyhy-core. A script or Lambda will need to be created to populate this collection. We will need to use the CSV version of the database which groups addresses under CIDR network blocks. I believe we will need to use the maxmind/geoip2-csv-converter tool to convert the CSV to a usable format for querying.

Acceptance criteria

How do we know when this work is done?