balena-os / meta-balena

A collection of Yocto layers used to build balenaOS images
https://www.balena.io/os
968 stars 115 forks source link

config.json can be affected by power cuts #1983

Closed acostach closed 2 years ago

acostach commented 4 years ago

config.json is modified by multiple services and a power cut can render this file corrupt, although atomic operations are being used.

List of services and details available in this OS thread.

ethanpailes commented 4 years ago

@acostach As I understand it rename is not atomic on FAT file systems in the face of power failure (I'm using this thread as a reference). If the algorithms that Balena uses assume atomicity of rename, corruption will continue to happen. I think that some sort of user-space journaling would be needed to really prevent this issue. I think something along the lines of:

on boot:
  if exists("config.json.<hash>"):
    if hash("config.json.<hash>") != hash:
      rm("config.json.<hash>") # rollback transaction
    else:
      copy("config.json.<hash>", "config.json.tmp")
      rename("config.json.tmp", "config.json")
      fsync()
      if hash("config.json") == hash:
        rm("config.json.<hash>")

def modify(new_content):
  write(format!("config.json.{}", hash(new_content)), new_content)
  fsync()
  copy("config.json.<hash>", "config.json.tmp")
  rename("config.json.tmp", "config.json")
  fsync()
  if hash("config.json") == hash:
    rm("config.json.<hash>")

might be needed to correctly modify a file on a FAT16 file system.

acostach commented 4 years ago

cc @alexgg @mtoman @markcorbinuk

jellyfish-bot commented 4 years ago

[nazrhom] This issue has attached support thread https://jel.ly.fish/2d88fb06-4650-497d-b99c-300e09267d46

ethanpailes commented 4 years ago

If there is no hard requirement to use FAT16 as the boot partition fs, I think using ext3 or ext4 should make rename actually have the semantics that we want. It seems that uboot does support ext* filesystems, so I don't see any obvious reasons why FAT16 would be required, but I may be missing something.

This blog post has some info about the semantics of different linux operating systems: https://blogs.gnome.org/alexl/2009/03/16/ext4-vs-fsync-my-take/

ethanpailes commented 4 years ago

@alexgg, thanks for starting work on this! I saw that you started the fat-rename branch, and since journalling is always a huge pain I decided to give it a read. Sorry if I jumped the gun a little on this, but I find this stuff fun :). It looks pretty good, but I have a few concerns:

All these notes are in relation to the os-helpers-fs file.

It looks like L196 assumes that there will only ever be one md5sum file in place. We can't make this assumption because there might be multiple crashes before we get a chance to clean up the extra files. We need to loop over all the md5sum files (preferably, but not crucially in timestamp order) to be correct (just deleting them to roll back won't work because if a md5sum file exists, we might have already corrupted the config.json).

L203 looks wrong to me. Renames are not atomic on FAT, so if the power dies in the middle of the mv command we can easily end up in a state where both the target file and source file have only partially been copied over. The solution to this is to use a tmp file between the backup file and the target file like I did in the psudocode that I posted.

There is an extra sync on L239. If the backup file is left around or currupted, the on-boot journaling process (verify_integrity) should just fix things up for us so there is no need to be careful here.

alexgg commented 4 years ago

Hi @ethanpailes, thanks for taking the time to review this. As you can guess the branch is just a proof-of-concept which I worked on to prepare for the internal discussions around this topic. We will develop a centralized journalled write, but it may not be in as done in the branch above.

Anyway, it is good to discuss about the implementation. I was trying to find the right balance between complexity and functionality and I did make some assumptions that I think hold true but will need to be verified.

One assumption is that only the files that are written by the OS need to be protected, and according to our analysis only config.json is currently being modified. Extending the mechanism to cover all possible accesses seems over-engineering as in the long run we plan to move content that needs to be written to out of the boot partition and leave all the contents in the boot partition as read-only.

Another assumption has to do with the multiple crashes theory. The OS will always read and integrity check/recover any file before any process writes to it. This holds true for config.json which was the focus of the branch above, so we will not get multiple crashes before recovering.

I think you are right about the mv in https://github.com/balena-os/meta-balena/blob/66c382468ebbb23c1abd6d3db04eecb1e2340c1a/meta-balena-common/recipes-support/os-helpers/os-helpers/os-helpers-fs#L203, I will need to revisit it.

Glad to keep discussing the details.

ethanpailes commented 4 years ago

@alexgg Oh I see what you mean. So long as the recovery process is run to completion before any of the rest of the application gets to run or try to modify the config.json, we should be safe. :+1: That's a little subtle, so it might be worth adding a note about that invariant to the comment on integrity_check.

Glad to keep discussing the details.

Awesome! I have a bit of experience writing transactional systems, so hopefully I can be a useful resource.

alexgg commented 4 years ago

@majorz This is tracking what we discussed yesterday

jellyfish-bot commented 2 years ago

[alexgg] This issue has attached support thread https://jel.ly.fish/7a4ac553-fe30-4ae5-aba2-40f32cfb7611

jellyfish-bot commented 2 years ago

[acostach] This issue has attached support thread https://jel.ly.fish/094dc5d9-6e39-40fa-8e70-4bbe69f49cab

alexgg commented 2 years ago

From v2.101.0 onwards the OS using the fatrw CLI application to perform safe IO to vfat partitions.