In order to be able to recover from a failed SOLR cloud, data.gov admins want a warm SOLR cloud standby that we could switch to in order to keep the site up.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
[ ] GIVEN a SOLR cloud environment standby is in place \
WHEN the in use SOLR cloud is not available \
THEN a data.gov admin can swap to using the warm standby (to keep the site working) \
AND a task can be used to seed any missing data in the SOLR instance
Background
The SOLR sync process on a full production instance takes 4 days. We're not sure how long a backup takes to restore, but probably some percentage of that (even if it takes 10% of time, we're talking 10 hours to recover). Since an outage of that time isn't acceptable, we want a SOLR Cloud instance ready to take over if SOLR crashes. Ideally we would use the native backup process from live SOLR, and restore it into the backup regularly (utilizes s3, see documentation here).
We are also considering a different approach, where we have a separate SOLR cluster spun up and synced every 1-4 days; ready to take over if necessary.
User Story
In order to be able to recover from a failed SOLR cloud, data.gov admins want a warm SOLR cloud standby that we could switch to in order to keep the site up.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
Background
The SOLR sync process on a full production instance takes 4 days. We're not sure how long a backup takes to restore, but probably some percentage of that (even if it takes 10% of time, we're talking 10 hours to recover). Since an outage of that time isn't acceptable, we want a SOLR Cloud instance ready to take over if SOLR crashes. Ideally we would use the native backup process from live SOLR, and restore it into the backup regularly (utilizes s3, see documentation here).
We are also considering a different approach, where we have a separate SOLR cluster spun up and synced every 1-4 days; ready to take over if necessary.
Security Considerations (required)
None, data does not leave secure environment
Sketch
Alternate Sketch
ckan-solr-standby-sync
cloud.gov app, with no running jobsonly missing
solr sync m-thur