Generally speaking we should never fail to allocate space from btrfs_reserve_extent in cow_file_range, if we do then we've messed up with the ENOSPC stuff. However we could fail to allocate memory elsewhere in this loop, and bugs do happen (that's how I noticed this problem).
The problem exists with compression. We will hand off a range of locked pages to compress to the async threads. If they choose to not do compression, we will call cow_file_range with unlock == 0 for the entire range. Assume we are making a 128MIB allocation, but the allocator is only able to satisfy 64MIB in the first loop, it will create the ordered extent and set up the pages, but not unlock them for that first 64mib. Then we adjust start and try to allocate the rest of the 64MIB range. If this fails, we go to out_unlock and properly clear the rest of the range, which is 64MIB-128MIB. However we still have the first range locked. We need to still call extent_write_locked_range for this first chunk, because we successfully allocated that area and have pages waiting to be written out for them.
The solution is to somehow let the caller know that we successfully handled a sub-range of the range asked for, so we can do the right thing. The trick is for the normal buffered IO case we unconditionally mark the first page with PageError() if we get an error from cow_file_range(). In the case where we fail at some future range, this is actually wrong. And we potentially end up with the same problem where we don't initiate writes on the initial page and now we have an ordered extent that will never complete because the IO wasn't issued for the initial page.
This is kind of a weird thing to untangle, the best bet would be to enable error injection for cow_file_range, and then start reproducing the hangs and fixing the problems that fall out.
Generally speaking we should never fail to allocate space from
btrfs_reserve_extent
incow_file_range
, if we do then we've messed up with the ENOSPC stuff. However we could fail to allocate memory elsewhere in this loop, and bugs do happen (that's how I noticed this problem).The problem exists with compression. We will hand off a range of locked pages to compress to the async threads. If they choose to not do compression, we will call
cow_file_range
withunlock == 0
for the entire range. Assume we are making a 128MIB allocation, but the allocator is only able to satisfy 64MIB in the first loop, it will create the ordered extent and set up the pages, but not unlock them for that first 64mib. Then we adjuststart
and try to allocate the rest of the 64MIB range. If this fails, we go toout_unlock
and properly clear the rest of the range, which is 64MIB-128MIB. However we still have the first range locked. We need to still callextent_write_locked_range
for this first chunk, because we successfully allocated that area and have pages waiting to be written out for them.The solution is to somehow let the caller know that we successfully handled a sub-range of the range asked for, so we can do the right thing. The trick is for the normal buffered IO case we unconditionally mark the first page with PageError() if we get an error from cow_file_range(). In the case where we fail at some future range, this is actually wrong. And we potentially end up with the same problem where we don't initiate writes on the initial page and now we have an ordered extent that will never complete because the IO wasn't issued for the initial page.
This is kind of a weird thing to untangle, the best bet would be to enable error injection for
cow_file_range
, and then start reproducing the hangs and fixing the problems that fall out.