btrfs / btrfs-todo

An issues only repo to organize our TODO items
21 stars 2 forks source link

ENOSPC rework #53

Open josefbacik opened 6 months ago

josefbacik commented 6 months ago

We have a variety of issues here, there's some things I want to get done in order to make progress. This isn't exhaustive, but is relatively line by line of the things that I know are currently bad and need to be worked on.

Problems

We could drastically increase our multi-threaded performance by addressing these issues, and reduce the corners where we can still end up with transaction abort's.

Overall design

I want to do a few things to move us in a better direction and address the above problems.

  1. Use per-cpu wherever possible. We hit this machinery pretty hard, we need to reduce the lock contention by relying on lockless strategies wherever possible.
  2. Always, always, always add our reservation to ->bytes_may_use, no matter what path we're in. This is to address the inaccuracy part. I would change the delalloc metadata reservations to immediately add to ->bytes_may_use when we know we have more reservation. This would be to make sure that any new writes coming in don't use up space that isn't available because of existing delalloc requirements.
  3. No longer use block_rsv's. We currently call use_block_rsv() to make sure we have a reservation to call into the allocator. I want to decouple our reservation system from our actual usage. With 2 above our reservation will always be kept uptodate with the current worst case usage in the file system, so it's the entry points to this system (__reserve_bytes) that are responsible for balancing our outstanding reservations with our actual usage.
  4. Allow for overcommit up to the size of the disk. Currently we curtail the overcommit size to some fraction of available space. This is to be pessimistic about the case where we could use every reservation we have outstanding. But this doesn't and can't happen in practice, we are not going to allocate BTRFS_MAX_LEVEL * 3 blocks for every single delayed ref we have outstanding. Instead use our flushing infrastructure to scale with our usage, and simply be more aggressive as we get closer to full. We will utilize the global_block_rsv as a real, on disk reservation to make sure we can always get what we need to disk, and then fall back to the slow, not lockless path when we run out of free chunk space.

Steps to address the issues

Each of these checkboxes is a single patch series. Some of this can be done independently, but I've listed them in order I would do them. The idea is that anybody should be able to pick these up and make progress without having my level of intimate knowledge of the ENOSPC machinery.

Once those things are done (they can be done in any order), the following can be done, but must be done in order.