dm-vdo / vdo

Userspace tools for managing VDO volumes.
GNU General Public License v2.0
193 stars 32 forks source link

deleting Files high write-latency and freeze xfs during #57

Open madmax01 opened 2 years ago

madmax01 commented 2 years ago

Hello Team,

is there an specific Optimize possible for delete files?

Issue = High Write Latency during Write

no Issues when adding new Data

physical disk = 4TB Logical = 8TB (vdo)

vdo create --device=/dev/sdb --sparseIndex=enabled --name=vdo0 --deduplication=enabled --compression=enabled --writePolicy=sync --vdoLogicalSize=8T --vdoLogicalThreads=8 --vdoPhysicalThreads=8 --vdoBioThreads=8 --vdoCpuThreads=8 --vdoAckThreads=8 --vdoHashZoneThreads=8 --blockMapCacheSize=256M --force

with XFS on top and running NFS Server... when i delete data the whole volume is locked and Delete takes several Seconds.... Volume latency up to 10K ms

when i don't use VDO its fast

i can change blockMapCache and makes no Change

rhawalsh commented 2 years ago

Hi @madmax01,

Can you provide us with a little bit more information on what you're working with?

  1. How is the filesystem mounted? (-o discard)? Is it possible to mount without discard and use fstrim periodically?
  2. What are the physical characteristics of the machine this is running on? (CPU cores, Memory, disks, etc.)
  3. What version of VDO are you using?
  4. What kind of data are you deleting? Large files, Small files?

We do have some things that you can tune to improve performance including discards, but knowing the underlying configuration will help us provide suggestions.

madmax01 commented 2 years ago

Hi @rhawalsh,

thx for answer.... sry,... sure pasting Details.

Main Problem: When upper layer sending an Delete,... equal if XFS have discard or not,... XFS freeze then seems.

how looks atm is

3.5T HW Volume > VDO (4TB logical) > XFS > Gluster

Device Size Used Available Use% Space saving% /dev/mapper/vdo0 3.8T 197.1G 3.6T 5% 64%

Linux = AlmaLinux release 8.6 (Sky Tiger) (But had issues with lower versions also equal which downstream)

kernel = 4.18.0-372.9.1.el8.x86_64

1: i tested both discard and nodiscard through fstab discard = xfs freezed for few sec (high volume latency) nodiscard + fstrim = this runs very long + after a while XFS freezed again and gluster on top was unresponsive for >120s

2: i have virtual Setup. 8vCore/20GB Ram, 3.5TB pass-through LSI 3108 (raid10 WB) > tested also WT,.. no difference

3: i used differently versions across the last 1-2Y,..... but never really recognized the issue because of tiny files delete with "discard). latest version tested: VDO version: 6.2.6.14 (Alma linux 8.6) Also tested with Oracle 8.6

4: atm is a mix load... can be MB files or GB Files,....... i have mostly virtual disks on them which are mostly bigger then >10GB. the other tiny files are may few MB.


how i done the config is:

VDO: vdo create --device=/dev/sdb --sparseIndex=enabled --name=vdo0 --deduplication=enabled --compression=enabled --writePolicy=sync --vdoLogicalSize=4T --vdoLogicalThreads=8 --vdoPhysicalThreads=8 --vdoBioThreads=8 --vdoCpuThreads=8 --vdoAckThreads=8 --vdoHashZoneThreads=8 --blockMapCacheSize=256M --force

XFS Mount Discard:
/dev/mapper/vdo0 /vdo/vdo0 xfs defaults,discard,noatime,nodiratime,allocsize=131072k,logbufs=8,logbsize=256k,_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0

for xfs create i used this syntax "mkfs.xfs -K /dev/mapper/vdo0"

xfs_info looks like this:

meta-data=/dev/mapper/vdo0 isize=512 agcount=4, agsize=268435455 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=0 inobtcount=0 data = bsize=4096 blocks=1073741820, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

simple test was just on XFS straight with VDO underneath. Here i saw then once VDO is under XFS > getting Troubles

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1

then i done "rm -rf ran*" and next to it i tested with "ioping -c 1000 ." got several seconds freeze on it.


from XFS point of View from what i understand discard happens async > so XFS shouldn't freeze for several seconds.. or with fstrim then several minutes.

i tried also to use just compression or dedup,.. didn't made a Difference...

But when i use XFS without VDO > its blazing fast

just really surprised how RHEL running VDO > as they recommend 10x ratio of Physical Disk for Logical,......

thx for checking and reading

Looking forward to understand how can be resolved.. Appreciated

madmax01 commented 2 years ago

i can add also an LV between xfs and VDO... makes zero difference... xfs freeze for several Seconds (discard) and fstrim for several minutes. HW Raid > VDO > PV > VG > LV Thin > XFS (tested also higher discard to match up the LV Thin 4MB).

So far the Discard is more less impact then the fstrim,..... but even that one freeze too long.

What would be the recommendations for tunings?

madmax01 commented 2 years ago

May any Infos on that one? Understand its not payed Service,.........but may good after some Days to see any Kinda info. If not,.... not sure whats the Purpose of creating issues

rhawalsh commented 2 years ago

Hi @madmax01, I apologize for the extended delay. I agree that we need to do a better job at keeping up with threads on GitHub issues. We have some changes to our workflow coming up soon that should hopefully help improve our awareness and transparency on what is going on in the VDO project. Please stay tuned.

To answer your question on tuning, you might want to consider reducing the overall amount of concurrent discards that VDO will process alongside other IO operations. By default, VDO will allow up to 1500 of the 2000 available IO's in progress to be discards. This can starve the system from other IO operations, so it may be preferable to reduce that number. Of course the trade-off here is that discards will now take longer to complete since you're cutting down the amount the VDO volume can do at any one time.

First, you can inspect the current setting for VDO by looking at /sys/kvdo//discards_limit. You can change it by echoing a new value into it such as 500.

echo 500 > /sys/kvdo/vdo0/discards_limit

When it comes down to choosing whether to use fstrim or mounting with -o discard is a matter of preference and workload. First, if you're going to be doing a lot of larger discards, it's probably best that you keep it under tight control, so using fstrim would be preferred. But if you're just doing normal IO operations and generally smaller files are being created and deleted, it may make sense to just mount with -o discard to reduce the amount of maintenance needed on the volume.

The benefit of using fstrim over mounting with -o discard is that you can schedule when the discards take the bandwidth to happen at a time when other IO is not really so busy. There are also creative ways that you can run the fstrim across an XFS volume to trim certain areas, so you could carve up the volume to be discarded in portions rather than the whole thing all at once. If you need some information on how to achieve that, let me know and I'll dig up that procedure. It's been a bit since I've seen it in use.

rhawalsh commented 2 years ago

Hi @madmax01,

As I mentioned, there is a way to also change the size of discards that VDO accepts, which also may help improve your experience. Be aware that by increasing this value, you're going to increase the amount of IO going to the storage, so you may still end up starving the system from being able to complete everything quickly. You will need to try some values and see what works best for your use case.

First, to see what size discard you're allowing currently, you can look in /etc/vdoconf.yml and see the maxDiscardSize for a given volume in there. The default value is 4K.

To adjust this value, you would use the vdo modify command such as vdo modify --name vdo0 --maxDiscardSize 4M followed by a vdo stop --name vdo0 and vdo start --name vdo0.

You can also verify what these settings are set to currently by inspecting /sys/block/$(basename $(readlink /dev/mapper/vdo0))/queue/discard_max_bytes If it is set to the default value, this should read 4096. If you were to set the value to something like 4M, then the value would report 4194304.

madmax01 commented 2 years ago

Hi,

thx for coming back.

yes i tested already

echo 500 > /sys/kvdo/vdo0/discards_limit (default seems 1500,......and max if i understood 2000,.... seems its upper limit anyway?)

(also tried to increase threads to 8)

but this made the Situation more worst... XFS fully freezed longer. As more lower i went (discards_limit) - the longer XFS freezed.

in general i cannot confirm "fstrim" to be the good one.... fstrim gaved me several "minutes" outages where fstab mount with discard few seconds..... (sure seconds already bad,.. but minutes are a nightmare). Atm i disabled both,.. as i cannot run them without impact ;(.

maxDiscardSize: 4K is default Size as i did understood is the recommended one.

Just not really understanding why XFS on top freezes and getting heavy timeout.

when using XFS without vdo > zero Troubles.

--maxDiscardSize you mean would be good to increase this? and if so,.. which Value is typically? So far i thought it need to match up with blocksize from XFS

thx

Max

rhawalsh commented 2 years ago

Hi @madmax01

Hi,

thx for coming back.

yes i tested already

echo 500 > /sys/kvdo/vdo0/discards_limit (default seems 1500,......and max if i understood 2000,.... seems its upper limit anyway?)

(also tried to increase threads to 8)

but this made the Situation more worst... XFS fully freezed longer. As more lower i went (discards_limit) - the longer XFS freezed.

If XFS is blocking specifically for discards, then reducing the total number of concurrent discards will only make the problem worse since you're waiting on that work to complete.

in general i cannot confirm "fstrim" to be the good one.... fstrim gaved me several "minutes" outages where fstab mount with discard few seconds..... (sure seconds already bad,.. but minutes are a nightmare). Atm i disabled both,.. as i cannot run them without impact ;(.

I would suggest trying to change the number of total discards to 1000 or 1500 and then adjusting the max size to something larger. And then you can set up fstrim to only discard portions of the volume. That probably won't prevent all freezes, at least not right away. You're going to need to find the best settings for your environment.

maxDiscardSize: 4K is default Size as i did understood is the recommended one.

While the default is recommended, it's tunable for a reason. :) What kind of storage do you have making up the RAID device? That will influence how large you can set the discards.

You could start by setting it to 1M and then monitoring disk usage. If it's not using the full bandwidth, then you could consider making it larger, but if you're noticing latency starting to get out of control, then you will need to reduce the size until you reach an acceptable level of performance. It's gonna be a bit of trial and error, since every workload is different.

Just not really understanding why XFS on top freezes and getting heavy timeout.

when using XFS without vdo > zero Troubles.

--maxDiscardSize you mean would be good to increase this? and if so,.. which Value is typically? So far i thought it need to match up with blocksize from XFS

thx

Max

madmax01 commented 2 years ago

i tested this with Raid10 volume for the Data

Raid10 Volume > VDO > XFS

in the lab i tested with zero workload data on it........ just done the fstrim ;).

/sys/kvdo/vdo0/discards_limit > which value represent this? is it IOPS value? or bytes

rhawalsh commented 2 years ago

i tested this with Raid10 volume for the Data

Raid10 Volume > VDO > XFS

in the lab i tested with zero workload data on it........ just done the fstrim ;).

/sys/kvdo/vdo0/discards_limit > which value represent this? is it IOPS value? or bytes This value represents the number out of 2000 available concurrent IOs at a time.

rhawalsh commented 2 years ago

Also, it would be useful to watch disk utilization via iostat (something like iostat -dmx 1) while the filesystem is "locked up".