dm-vdo / kvdo

A kernel module which provide a pool of deduplicated and/or compressed block storage.
GNU General Public License v2.0
241 stars 46 forks source link

Parallel write compression is inefficient #41

Closed hedongzhang closed 2 years ago

hedongzhang commented 3 years ago

System information

Type Version/Name
Distribution Name Redhat-7.8
Kernel Version 3.10.0-1127.19.1.el7.x86_64
Architecture x86_64
vdo Version 6.1.3.4
kmod-kvdo Version 6.1.3.7-5

Describe the problem you're observing

I used fio to test compression of vdo, and found that the compression ratio was very low when multiple processes were tested in parallel, whether sequential or random, while a single process was normal, What is the reason for this?

fio saving%
sequential (numjobs:1 iodepth:8) 66%
sequential (numjobs:4 iodepth:8) less 30%
randon (numjobs:1 iodepth:8) 66%
randon (numjobs:4 iodepth:8) less 43%

Describe how to reproduce the problem

numjobs=1 #1 or 4 iodepth=8

bs=1M rw=write

scramble_buffers=1 buffer_compress_percentage=70 buffer_compress_chunk=4K

group_reporting [job] filename=/dev/mapper/vdo1


 - random write fio

cat fio.rw

[global] ioengine=libaio direct=1 size=100%

numjobs=1 #1 or 4 iodepth=8

bs=4K rw=randwrite

scramble_buffers=1 buffer_compress_percentage=70 buffer_compress_chunk=4K

group_reporting [job] filename=/dev/mapper/vdo2

drckeefe commented 3 years ago

@hedongzhang In quick review, the issue could be due to the jobs writing over each other. You might find better results if you offset each of the jobs. Add this to your fio config. It does assume that when you run 4 jobs that you have 100G of space to test on. Change the offset_increment to fit your test.

offset=0 offset_increment=25G

hedongzhang commented 3 years ago

@drckeefe Thanks for your reply. Setting offset can indeed avoid jobs writing over each other and make compression normal. But I do not understand why multiple Jobs writing the same position at the same time will affect compression

corwin commented 3 years ago

@hedongzhang, the reason that overwrites decrease the compression efficiency is due to the way that VDO saves space with compression.

Because VDO can only read and write fixed sized blocks to its underlying storage, it saves space on compression by combining up to 14 compressed 4K blocks into a single 4K block on disk (the actual number of course depends on how well the data compresses). VDO is not able to reclaim the physical block until all of the compressed blocks in it have been overwritten. So if the fio jobs are partially over-writting each other's data, the efficiency decreases since over writing some but not all of the compressed blocks in a single physical block requires allocating a new block (or blocks) for the new data, but won't free up the old block.

hedongzhang commented 3 years ago

@corwin, Thank you. I get it, However, there is an obvious problem. As the amount of data written to the VDO increases, there will inevitably be a large number of such fragmented physical blocks(that over writing some but not all of the compressed blocks), and the compression efficiency will gradually decrease or even go to zero unless completely rewritten. Are there currently tools or plans to address such fragmented physical blocks?

corwin commented 3 years ago

@hedongzhang, you are correct that this is a potential problem in the long term. However, in practice we have not seen it to be much of an issue. I think the main reason for this is that in real systems, there tends to be a significant amount of temporal locality to the pattern of writes and overwrites. Indeed, the efficiency of vdo's deduplication index relies on this tendancy. In the case of compression, it tends to be the case that much (perhaps most) of the data which is written at any given time, tends to be overwritten or deleted mostly at the same time as well. The result being that the number of inefficient fragments tends to be a relatively small fraction of the total data stored. It is worth nothing that a parallel set of fio jobs randomly writing and overwriting the same logical address space does not seem to be a common real-world workload.

We don't currently have plans to address this, largely because no one has complained to us that it is an actual problem in real systems. Certainly, if it turns out to be a significant problem, we will address it.

hedongzhang commented 3 years ago

@corwin Ok, I'll follow up on this problem based on our specific scenario and see how much impact it actually has