ThinkParQ / beegfs

Public repository for the BeeGFS Parallel File System
https://www.beegfs.io
Other
86 stars 12 forks source link

Progressive File Layout #16

Open obilaniu opened 1 month ago

obilaniu commented 1 month ago

A few weeks back at the Stammtisch, the idea of a Progressive File Layout (PFL) implementation (à la Lustre but less complicated) was raised. The user that raised the topic hasn't made a feature request yet here so I am taking the initiative.

  1. Absent PFL, when a filesystem is configured, a choice of default striping must be made, and thus a compromise.
    • If no striping is done (stripe=1) then one single large file downloaded onto the filesystem will unbalance the storage targets and all accesses to it will be directed to one storage target. On the other hand, smaller files perform better.
    • If striping is done (stripe>1) then a stripe count and size might be found to ease the burden of any one large file on the filesystem, but it might penalize small files also on the filesystem because of more targets to be contacted to piece them together.
      • For example, the Canadian national clusters managed by Digital Research Alliance Canada configure their Lustre with a default PFL of 1x (no) striping [0, 128MiB) and 2x1MiB striping for the range [128MiB, end), or suchlike.
  2. Certain files have internal structure (such as a read-mostly header, followed by parallel-access data areas) that could benefit from different striping schemes.
  3. Currently, it is not possible to migrate in-place a file from one striping scheme to another. A "deep" copy is required, taking double the space temporarily. Such space may not be available, thus also requiring a more convoluted migration process.

What I proposed at the Stammtisch is a simplified variant of PFL with 2 (+1) zones. A user would be able to define two zones:

each with independent stripe count/size.

The additional (+1) zone would be a filesystem-internal zone, not visible to the user, whose utility would be in guaranteeing that an in-place, server-side migration between arbitrary two-zone striping schemes can always be performed safely. That would be achieved by gradually rewriting the file from one scheme to another, chunk-by-chunk, never fully duplicating the file and atomically updating with every chunk the updated "true" PFL until it matches the target PFL.

This would address the target-unbalance issue and the performance issues; two zones ought to cover most use-cases; and also enable restriping without deep-copying.

iamjoemccormick commented 1 month ago

Hi @obilaniu,

Thank you for the detailed proposal and write-up of what we discussed at Stammtisch. I agree this would be a valuable addition to BeeGFS. While we don't have immediate plans to begin work on this, we'll use this issue to continue collecting feedback and ideas on how this could eventually be done.