Charcoal-SE / metasmoke

Web dashboard for SmokeDetector.
https://metasmoke.erwaysoftware.com
Creative Commons Zero v1.0 Universal
43 stars 34 forks source link

Public DB Dumps: Re-tool the dump system to index files on-server directly and serve them, reenable automatic indexer to find and display valid links. #773

Closed teward closed 4 years ago

teward commented 4 years ago

Using ancillary systems in the infrastructure, we now have a platform to make daily DB dumps and updates in a way that is secure, actually sanitizes the data, and transfers it back to Metasmoke for use.

Currently, the full DB backup infrastructure seems to be configured to utilize the S3 of @Undo1. With the migration to my server farm, we can now serve the dump content directly from server with the proper reference on-disk to serve it.

To that end, the public db dumps system needs to be altered to adapt for this, picking up the public DB dump file (a .sql.gz - GZipped SQL file) and serve the file via MS directly, bypassing the need for S3 to handle the dumps.

Initial assignees for this are @thesecretmaster @ArtOfCode- and @Undo1 as they are most familiar with the Ruby codebase to alter the backup handlers where needed to make this work.

Cron job timing

The ancillary system runs for the data backups and dumping are currently timed at between 8min and 11min of runtime, and handles transfer of the file back to Metasmoke host. That cron job currently runs at 00:00 UTC but can be altered to change what time it runs at. Therefore, by 00:15 - 00:20 UTC, Metasmoke will have a file it can serve and index.

On Metasmoke Host: Docroot and Naming Convention

Current on-system docroot where the dumps are sent to are: /var/railsapps/metasmoke/shared/dumps

Current naming convention is this for the specific dump files: dump_metasmoke_clean-$TIMESTAMP.sql.gz (where $TIMESTAMP is the unix epoch time of the DB dump run)

Backup Persistence

All the dump files are also backed up on the ancillary system, however that is not directly reachable from Metasmoke (for security reasons related to the unsanitized data).

For data storage/size purposes we probably should continue to ONLY keep one public MS dump daily available on MS.

teward commented 4 years ago

Also, for the record, there is an SQL dump I have up there now sanitized. The script on the ancillary box cleans up daily so around 00:10 there'll be a new SQL dump there.

teward commented 4 years ago

This is mostly done via the ancillary system - it populates the link data that is accessed by Metasmoke and then provides the links for it to parse via a YAML file.

Known minor issues/caveats:

None of those issues are blocking, however, and I can just work with em.