Open ktun95 opened 2 months ago
Here’s the updated markdown file with additional explanations of the Cluster Restoration process where it was lacking:
velero
. velero
is installed.1. Connect to the cluster.
This step ensures that you have the necessary access to the Kubernetes cluster where you will be restoring the resources. The command provided will export the kubeconfig for the cluster, allowing you to interact with it using kubectl
.
- Command: `kops export kubeconfig --admin`
2. Install Velero server components
Installing Velero will create a BackupStorageLocation
resource in the cluster, which is essential for managing backups and restores.
- Command: `velero install \
--namespace <NAMESPACE> \
--provider gcp \
--bucket <BUCKET_NAME> \
--secret-file <PATH_TO_SERVICE_ACCOUNT_JSON>`
- Example: `velero install \
--namespace staging \
--provider gcp --bucket staging-velero-k8s-backup \
--secret-file ./credentials/service-account.json`
The expectation is that after installation, Velero will synchronize with our object storage and create the necessary resources (Backup Objects, BackupController, etc.) in the cluster, which can be used to restore the cluster. See:
"Velero resources are synchronized with the backup files in object storage." (https://velero.io/docs/v1.11/migration-case/)
Note: You may not need to specify a plugin with the --plugins
flag if you are configuring a backup storage location at the time of installation. According to the Velero documentation:
"An exception to this is that when the user is not configuring a backup storage location or a snapshot storage location at the time of install, this flag is optional." (https://velero.io/docs/v1.11/overview-plugins/)
Since we already have a storage location set up (our Google Cloud Bucket), this can likely be ignored.
3. Verify that the Velero Backup is available
Before proceeding with the restoration, it’s crucial to ensure that the backup you intend to restore is available and in a good state. This command will provide details about the specified backup, including its status and any associated errors.
- Command: `velero backup describe <BACKUP-NAME>`
- Example: `velero backup describe recurring-20240903120937`
4. Update BackupStorageLocation to Read-only
This step is important to prevent any changes to the backup storage location during the restoration process. By setting the accessMode
to ReadOnly
, you ensure that no new backup objects can be created or existing ones deleted while the restoration is in progress.
- Command: `kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
--namespace <NAMESPACE> \
--type merge \
--patch '{"spec":{"accessMode":"ReadOnly"}}'`
- Example: `kubectl patch backupstoragelocation staging-velero-k8s-backup \
--namespace staging \
--type merge \
--patch '{"spec":{"accessMode":"ReadOnly"}}'`
"This prevents backup objects from being created or deleted in the backup storage location during the restore process" (https://velero.io/docs/v1.4/disaster-case/)
5. Restore cluster resources
This command initiates the restoration process from the specified backup. It will restore all the resources that were backed up, including deployments, services, and other Kubernetes objects. Monitor the restoration process to ensure that it completes successfully.
- Command: `velero restore create --from-backup <SCHEDULE NAME>-<TIMESTAMP>`
- Example: `velero restore create --from-backup recurring-20240903120937`
Note: After executing this command, you can check the status of the restoration using:
- Command: `velero restore describe <RESTORE_NAME>`
This will provide information about the restoration progress and any issues that may have occurred.
6. Revert BackupStorageLocation to Read-Write mode
If the restoration was successful, it’s important to re-enable read-write mode for the backup storage location. This allows new backups to be created again, ensuring that your cluster remains protected going forward.
- Command: `kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
--namespace <NAMESPACE> \
--type merge \
--patch '{"spec":{"accessMode":"ReadWrite"}}'`
- Example: `kubectl patch backupstoragelocation staging-velero-k8s-backup \
--namespace staging \
--type merge \
--patch '{"spec":{"accessMode":"ReadWrite"}}'`
This version includes additional explanations for each step, clarifying the purpose and importance of each action in the cluster restoration process.
Assumptions
velero
velero
is installedSteps to Restore Cluster Resources
1. Connect to the cluster.
2. Install Velero server components
Installing Velero will create a BackupStorageLocation resource in the cluster
The expectation is that after installation, Velero will synchronize (https://velero.io/docs/v1.11/how-velero-works/#object-storage-sync) with our object storage and create the necessary resources (Backup Objects, BackupController, etc.) in the cluster, which can be used to restore the cluster. See:
"Velero resources are synchronized with the backup files in object storage." (https://velero.io/docs/v1.11/migration-case/)
NOTE: I'm not sure if we need to specify a plugin with the
--plugins
flag. According to the velero documentation:"An exception to this is that when the user is not configuring a backup storage location or a snapshot storage location at the time of install, this flag is optional." (https://velero.io/docs/v1.11/overview-plugins/)
Since we already have a storage location set up (our Google Cloud Bucket), maybe this can be ignored.
3. Verify that the Velero Backup is available
4. Update BackupStorageLocation to Read-only
"this prevents backup objects from being created or deleted in the backup storage location during the restore process" (https://velero.io/docs/v1.4/disaster-case/)
5. Restore cluster resources
6. Revert BackupStorageLocation to Read-Write mode
If the restoration was successful, we should re-enable read-write mode so that new back-ups can be created.
Reference