If the machine is preemptive, it might be scheduled to be preempted (or encounter other situations that cause the machine to go down). If the checkpoint is being saved at the exact moment, the original data will be corrupted. Therefore, it is reasonable to keep multiple backups locally. Considering the disk space occupancy, it is better to support cloud storage, such as supporting the use of AWS s3.
If the machine is preemptive, it might be scheduled to be preempted (or encounter other situations that cause the machine to go down). If the checkpoint is being saved at the exact moment, the original data will be corrupted. Therefore, it is reasonable to keep multiple backups locally. Considering the disk space occupancy, it is better to support cloud storage, such as supporting the use of AWS s3.