Closed jameswilson closed 6 years ago
@balintk , Related issue for IULD8 project: https://bluespark.atlassian.net/browse/IULD8-403 where I'm proposing we start by adding a DrupalS3Backup.php
robo command file, with two features:
1) handle drush-based sql-dump
and sync to s3 with awscli
.
2) handle clean-up of s3 to ensure we retain no more than 15 days of backups.
I've asked @citlacom to mention issue IULD8-403 in the commits to this repository so we can clearly see for what project this was done, similar to what Balint and Jose did for the SO project.
Later on, we can refactor / extend DrupalS3Backup.php to leverage ifsnop/mysqldump-php so we can sanitize the backups, in a parallel backup step (because we'll need both pristine database dumps, as well as sanitized ones).
I've implemented the first step here, by adding commands to take the sql-dump and to create a tarball of the files folder.
I've Implemented the second step here by adding the aws-php-sdk dependency and custom functionality to upload the db dump and files tarball to S3.
I've implemented the third step here to clean up old dumps. I've added a new option called --keep
to specify some amount of time for which to keep backup files around on S3. The default value is 15 days
for GDPR compliance but can be overridden in .spark.yml
.
The command now looks like this:
The code adds the following command:
composer spark drupal:backup <options>
Options include:
--bucket (Required) The S3 bucket destination for the backup.
E.g. 'bsp-myproject'
--region The AWS region to connect to. If left blank, the
default value is 'us-east-1'. For a list of available
regions, see http://bit.ly/s3-regions.
--profile The AWS profile to use for connection credentials.
Default value is 'default'. The AWS SDK will first
try to load credentials from environment variables
(http://bit.ly/aws-php-creds). If not found, and if
this option is left blank, the SDK then looks for
the default credentials in the `~/.aws/credentials`
file. Finally, if you specify a custom profile
value, the SDK loads credentials from that profile.
See http://bit.ly/aws-creds-file for formatting info.
--keep A string representing a relative amount of time to
keep backups. The string must be parsable by PHP
`strtotime`. The default value is '15 days', which
is the recommendation for GDPR compliance. Files
found in the backup folder on S3 that are older than
this time will be removed. New files uploaded to S3
will have an Expires value set to now plus the
specified time. WARNING: be very careful modifying
the value of this option as it can and will delete
existing backups.
--truncate A comma-separated list of tables from which to
truncate values in the db dump. This maps to the
drush sql-dump --structure-table-list option.
Default value is 'cache,cache_*,sessions,watchdog'.
--skip A comma-separated list of tables from which to
exclude from the db dump.
--files A string or array of paths/to/files/or/folders to
include in the tarball. Paths should be relative
to the project root directory and not to the webroot.
--exclude A string or array of filenames/foldernames to
exclude from the tarball.
These command options can be specified inside the .spark.yml
file, like so:
command:
drupal:
backup:
options:
bucket: bsp-myproject
keep: 15 days
truncate: cache,cache_*,sessions,watchdog
skip: migrate_*
files:
- web/sites/default/files
- private
exclude:
- css
- js
- styles
- xmlsitemap
- backup_migrate
- ctools
- php
Assigning to @balintk for final review, and moving the remaining tasks (db sanitization) to a follow-up issue so this one can be closed for now. Thanks!
Extracted all the remaining work—see referenced issues. This one is good to close, thanks for the great work here, @jameswilson.
Create a
drupal:backup
task:drush sql-dump
; this will be generalized and refactored later to support more platform types, for now we're hardcodingdrupal8
support.[spark:name]-[env:ENVIRONMENT]-YYYY-MM-DD--HH-MM-SS.sql.gz
for database backups.[spark:name]-[env:ENVIRONMENT]-YYYY-MM-DD--HH-MM-SS.tgz
for filesystem backups.HH-MM-SS
) in filename to avoid same-day overwrites.ENVIRONMENT
environment variable in filename to ensure backups for multiple environments do not conflict if sent to the same s3 bucket. TheENVIRONMENT
variable may be stored in the.env
file in project root which is ignored by git and available to Robo via vlucas/phpdotenv. On Platform.sh environment variables can be specified using platform-cliname
variable to ensure backups for multiple projects do not conflict if sent to the same s3 bucket. Thename
variable should be stored in the application's[.spark.yaml](https://github.com/BluesparkLabs/spark-example/blob/master/.spark.yml)
file..env
file file in project root, ignored by git, and use vlucas/phpdotenv to load it into spark.~/.aws/credentials
and config in~/.aws/config
YYYY-MM-DD
-part of the filename, ignoreHH-MM-SS
timestamp).drush sql-dump
commands with machbarmacher/gdpr-dump and the GDPR Drupal module to perform sanitization on the fly.gdpr-replacements
parameter that denote which db tables and columns to sanitize to be stored in.spark.yml
, but then converted to the required JSON format in the spark command task so it can be passed to thegdpr-dump
command.