btrfs / btrfs-todo

An issues only repo to organize our TODO items
21 stars 2 forks source link

Cleanup BTRFS_INODE_NOCOMPRESS usage #26

Open josefbacik opened 3 years ago

josefbacik commented 3 years ago

We have a way to disable compression per files in btrfs, but there's a weird interaction with -o compress-force. You can set no compression with chattr per file and per directory, but we specifically ignore this setting if you are mounted with -o compress-force. This makes sense in general, because we not only use this flag for chattr, but also by the automatic heuristic code to mark an inode as not being a good candidate for compression.

This muddies the water a bit, because for the case of chattr we likely really want to disable compression in all cases, even in -o compress-force, but we don't want to do it for things marked with BTRFS_INODE_NOCOMPRESS because it could have been done with heuristics and thus not be completely correct.

We need to pick either

  1. Make BTRFS_INODE_NOCOMPRESS the ultimate authority on wether or not to compress a file. This would mean we need to add another flag for the inode if the heuristic determines that the file is not a good candidate for compression. One way to do this would be to simply use a runtime flag instead of setting BTRFS_INODE_NOCOMPRESS on the file, that way we only skip the compression while we have the inode in memory. This would be my preferred method of solving the problem.
  2. Make another flag to indicate the inode shouldn't be compressed, and use BTRFS_INODE_NOCOMPRESS in its current state. The other flag would probably need to be triggered with a btrfs property, and would require a lot more work.

My preference is for 1, which would mean you need to

kdave commented 3 years ago

I have some prototypes to improve the heuristics, with tracking of the compression quality. The goal is to never need to use compress-force for mount and just compress + heuristics that will set the nocompress flag after more evaluation that happens now (and is know to be weak).

I'm kind of amused about the label 'good first issue', it takes a lot of thinking and evaluation to do it right, and there are always more ways to do it.

josefbacik commented 3 years ago

Yeah 'good first issue' may have been overstating it, but mechanically it's very simple, just need to get consensus on what the behavior is we expect, which is the harder part. But if we break it down to behaviors it should be straightforward, we want

This opens up a few questions that we need answered

Once we get those questions answered then the code is relatively simple. My opinion is

  1. An in memory only flag for "this file sucks at compressing" and stop using NOCOMPRESS for this.
  2. NOCOMPRESS means don't even attempt compression, regardless of the mount options.
  3. chattr only sets COMPRESS or clears COMPRESS, it never sets NOCOMPRESS.
  4. We add a btrfs property that explictly sets NOCOMPRESS on the file.
Zygo commented 3 years ago
  1. chattr only sets COMPRESS or clears COMPRESS, it never sets NOCOMPRESS.

chattr (e2fsprogs) only got support for FS_NOCOMP_FL six months ago (better late than never, it was added to ext2 in 1998). You want to remove that?

I'd prefer leaving the chattr bits as they are, not because they're an amazing model of user experience, but because they've had whatever meaning they have for decades now, changing them will break someone, and they aren't onerous to implement as they are. chattr supported the 'm' bit only recently, but people could have rolled their own chattr and used the bit on btrfs since 2011.

  1. An in memory only flag for "this file sucks at compressing" and stop using NOCOMPRESS for this.

I am in favor of that, though. The user-writable chattr bits are the bailiwick of the admin, and code shouldn't be flipping them unless explicitly asked to.

We have an xattr btrfs.compression which takes a string argument, and is internally converted to bits. Can it be set to none to mean "ignore mount option, don't compress this"? Deleting the attribute would set it to default, "do what the mount option says."

There's a lot of flexibility in the xattr, we can also specify compression level, whether to bypass the heuristics or force compression, or (insert future feature here). There are only two usable chattr bits, so they're not very expressive, and we've painted ourselves into a corner on what both of them mean.

zhanglikernel commented 2 years ago

Hello, the label of this issue is good first issue, so I want to apply a patch to it. The following are my thoughts.

chattr only sets COMPRESS or clears COMPRESS, it never sets NOCOMPRESS.

  1. The fact is that chattr -c only clears the BTRFS_INODE_COMPRESS flag, it will not set BTRFS_INODE_NOCOMPRESS, chattr +m will set BTRFS_INODE_NOCOMPRESS as the inode flag, and the +m and +c flags cannot be set in the same inode at the same time.

  2. The current situation is that if the file system is mounted with the compression-force option, it will compress the file anyway (unless the compressed size is larger than the original, but every time we write to the file, it will check whether it is It can be compressed. If there is any possibility that the file can be compressed, it will compress the data), but if you use the option compress instead of compress-force to mount the file system, chattr +m is valid and it will disable the compression of the file. I think this is reasonable, because usually the administrator will do the mounting work, other users use the file system, if the administrator explicitly uses the compression force option to mount the file system, all users should abide by the rules, but if the administrator mounts Only the compression option, the user can choose not to compress.

  3. At present, if we use chattr +c set file to set compress, it will also add a compress xattr, but if we set chattr +m on the file, it will clear the compress attribute, it does not have xattr to indicate not to compress,

  4. I think it is correct to distinguish between the BTRFS_INODE_NOCOMPRESS and "this file sucks at compressing", so we need to introduce another sign to indicate "this file sucks at compressing"

  5. Now btrfs_inode→prop_compress only records which compression algorithm is used, not the compression level. I think we should also record the compression level, for example: prop_compress record algorithm in higher bit, and compression level lower bit.

  6. The current heuristic uses the Shannon entropy of the original file to determine whether the file should be compressed. It any possible the further  heuristics to tracking of the compression file Shannon entropy, I just wonder how to tracking of the compression quality.

The following is patch summary:

We add a btrfs property that explictly sets NOCOMPRESS on the file.

  1. Add btrfs xattr btrfs.compression value none corresponding to BTRFS_INODE_NOCOMPRESS, set by chattr +m or btrfs property set compress none, indicating that the file should not be compressed.

An in memory only flag for "this file sucks at compressing" and stop using NOCOMPRESS for this.

  1. Distinguish between BTRFS_INODE_NOCOMPRESS and "this file sucks at compressings" by adding a flag in the memory to indicate "the file is suck on compress"
  2. Expand prop_compress to include the compression level.
josefbacik commented 2 years ago

This sounds reasonable to me. I imagine we'll discuss it more on list, but make sure to lay the mechanics out in your cover letter for the patches so reviewers know what the plan is. I look forward to seeing the patches.

zhanglikernel commented 2 years ago

Hi, I have pushed a patch about this issue, [Related Patches] https://www.spinics.net/lists/linux-btrfs/msg120300.html

It contains 5 parts and a patch summary description ([0/5]), but I forgot to set each part of the patch to in-reply-to="[0/5]" when sending the email, I should Clear this error? Resend. Or just keep it as it is.