PlanktoScope / forklift

Composable, reprovisionable, decentralized management of apps & configs on Raspberry Pis and other embedded Linux systems
Apache License 2.0
6 stars 0 forks source link

Interrupting a caching operation with a partial failure leaves the cache in an invalid state #267

Open ethanjli opened 3 weeks ago

ethanjli commented 3 weeks ago

Currently, if we interrupt a forklift plt cache-repo operation while a required repo is in the middle of being cached, causing a partial failure of the operation, then the repo cache is left in a state that will cause errors when the partially-downloaded repo (or the cached repo mirror) is used. We might also have a similar problem with partial failures of file downloads.

If we encounter an error while trying to use the cached repo mirror (i.e. the bare git repo) and we have internet access, we should delete and re-download it. If we determine that a cached repo or file download is corrupted (e.g. because it fails checksum the checks suggested by #243 ), then we should re-clone it from our cached repo mirror.

At a minimum, this recovery work should be done whenever we run the forklift plt cache-repo command (and the respective recovery work should be done for the other plt cache-* subcommands. It might also make sense to do it whenever we run forklift plt stage/apply. Then forklift plt switch/upgrade should inherit that recovery work when they cache the pallet's requirements.