dictybase-docker / cluster-management

BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Restoring Cluster #157

Open ktun95 opened 2 months ago

ktun95 commented 2 months ago

Assumptions

  1. We already have scheduled backups created using velero
  2. An empty cluster is running
  3. velero is installed
  4. Using GCP
  5. Using the same version of Kubernetes as previous cluster

Steps to Restore Cluster Resources

1. Connect to the cluster.

    -Command: `kops export kubeconfig --admin`

2. Install Velero server components

Installing Velero will create a BackupStorageLocation resource in the cluster

    -Command: `velero install \
        --namespace <NAMESPACE> \
        --provider gcp \ 
        --bucket <BUCKET_NAME> \
        --secret-file <PATH_TO_SERVICE_ACCOUNT_JSON>`

    -Example: `velero install \
        --namespace staging \
        --provider gcp --bucket staging-velero-k8s-backup \
        --secret-file ./credentials/service-account.json`

The expectation is that after installation, Velero will synchronize (https://velero.io/docs/v1.11/how-velero-works/#object-storage-sync) with our object storage and create the necessary resources (Backup Objects, BackupController, etc.) in the cluster, which can be used to restore the cluster. See:

"Velero resources are synchronized with the backup files in object storage." (https://velero.io/docs/v1.11/migration-case/)

NOTE: I'm not sure if we need to specify a plugin with the --plugins flag. According to the velero documentation:

"An exception to this is that when the user is not configuring a backup storage location or a snapshot storage location at the time of install, this flag is optional." (https://velero.io/docs/v1.11/overview-plugins/)

Since we already have a storage location set up (our Google Cloud Bucket), maybe this can be ignored.

3. Verify that the Velero Backup is available

    -Command: `velero backup describe <BACKUP-NAME>`
    -Example: `velero backup describe recurring-20240903120937`

4. Update BackupStorageLocation to Read-only

    -Command: `kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
            --namespace <NAMESPACE> \ 
            --type merge \
            --patch '{"spec":{"accessMode":"ReadOnly"}}'`

    -Example: `kubectl patch backupstoragelocation staging-velero-k8s-backup \
            --namespace staging \
            --type merge \
            --patch '{"spec":{"accessMode":"ReadOnly"}}'`

"this prevents backup objects from being created or deleted in the backup storage location during the restore process" (https://velero.io/docs/v1.4/disaster-case/)

5. Restore cluster resources

    -Command: `velero restore create --from-backup <SCHEDULE NAME>-<TIMESTAMP>`
    -Example: `velero restore create --from-backup recurring-20240903120937`

6. Revert BackupStorageLocation to Read-Write mode

If the restoration was successful, we should re-enable read-write mode so that new back-ups can be created.

    -Command: `kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
            --namespace <NAMESPACE> \ 
            --type merge \
            --patch '{"spec":{"accessMode":"ReadWrite"}}'`

    -Example: `kubectl patch backupstoragelocation staging-velero-k8s-backup \
            --namespace staging \
            --type merge \
            --patch '{"spec":{"accessMode":"ReadWrite"}}'`

Reference

ktun95 commented 2 months ago

Additional Explanation by ChatGPT

Here’s the updated markdown file with additional explanations of the Cluster Restoration process where it was lacking:


Assumptions

  1. We already have scheduled backups created using velero.
  2. An empty cluster is running.
  3. velero is installed.
  4. Using GCP.
  5. Using the same version of Kubernetes as the previous cluster.

Steps to Restore Cluster Resources

1. Connect to the cluster.
This step ensures that you have the necessary access to the Kubernetes cluster where you will be restoring the resources. The command provided will export the kubeconfig for the cluster, allowing you to interact with it using kubectl.

- Command: `kops export kubeconfig --admin`

2. Install Velero server components
Installing Velero will create a BackupStorageLocation resource in the cluster, which is essential for managing backups and restores.

- Command: `velero install \
    --namespace <NAMESPACE> \
    --provider gcp \ 
    --bucket <BUCKET_NAME> \
    --secret-file <PATH_TO_SERVICE_ACCOUNT_JSON>`

- Example: `velero install \
    --namespace staging \
    --provider gcp --bucket staging-velero-k8s-backup \
    --secret-file ./credentials/service-account.json`

The expectation is that after installation, Velero will synchronize with our object storage and create the necessary resources (Backup Objects, BackupController, etc.) in the cluster, which can be used to restore the cluster. See:

"Velero resources are synchronized with the backup files in object storage." (https://velero.io/docs/v1.11/migration-case/)

Note: You may not need to specify a plugin with the --plugins flag if you are configuring a backup storage location at the time of installation. According to the Velero documentation:

"An exception to this is that when the user is not configuring a backup storage location or a snapshot storage location at the time of install, this flag is optional." (https://velero.io/docs/v1.11/overview-plugins/)

Since we already have a storage location set up (our Google Cloud Bucket), this can likely be ignored.

3. Verify that the Velero Backup is available
Before proceeding with the restoration, it’s crucial to ensure that the backup you intend to restore is available and in a good state. This command will provide details about the specified backup, including its status and any associated errors.

- Command: `velero backup describe <BACKUP-NAME>`
- Example: `velero backup describe recurring-20240903120937`

4. Update BackupStorageLocation to Read-only
This step is important to prevent any changes to the backup storage location during the restoration process. By setting the accessMode to ReadOnly, you ensure that no new backup objects can be created or existing ones deleted while the restoration is in progress.

- Command: `kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
        --namespace <NAMESPACE> \ 
        --type merge \
        --patch '{"spec":{"accessMode":"ReadOnly"}}'`

- Example: `kubectl patch backupstoragelocation staging-velero-k8s-backup \
        --namespace staging \
        --type merge \
        --patch '{"spec":{"accessMode":"ReadOnly"}}'`

"This prevents backup objects from being created or deleted in the backup storage location during the restore process" (https://velero.io/docs/v1.4/disaster-case/)

5. Restore cluster resources
This command initiates the restoration process from the specified backup. It will restore all the resources that were backed up, including deployments, services, and other Kubernetes objects. Monitor the restoration process to ensure that it completes successfully.

- Command: `velero restore create --from-backup <SCHEDULE NAME>-<TIMESTAMP>`
- Example: `velero restore create --from-backup recurring-20240903120937`

Note: After executing this command, you can check the status of the restoration using:

- Command: `velero restore describe <RESTORE_NAME>`

This will provide information about the restoration progress and any issues that may have occurred.

6. Revert BackupStorageLocation to Read-Write mode
If the restoration was successful, it’s important to re-enable read-write mode for the backup storage location. This allows new backups to be created again, ensuring that your cluster remains protected going forward.

- Command: `kubectl patch backupstoragelocation <STORAGE LOCATION NAME> \
        --namespace <NAMESPACE> \ 
        --type merge \
        --patch '{"spec":{"accessMode":"ReadWrite"}}'`

- Example: `kubectl patch backupstoragelocation staging-velero-k8s-backup \
        --namespace staging \
        --type merge \
        --patch '{"spec":{"accessMode":"ReadWrite"}}'`

Reference


This version includes additional explanations for each step, clarifying the purpose and importance of each action in the cluster restoration process.