High WA% without workload

fe-ax commented 11 months ago

Describe the bug A high WA% (Waiting for I/O) time while nothing is happening on the DB. CPU usage is nearly 0%.

To Reproduce Steps to reproduce the behavior:

Run with command dragonfly --logtostderr --logtostderr --maxmemory=4gb --save_schedule=*:* --hz=5 --dbfilename dump.rdb --df_snapshot_format=false

Expected behavior Lower WA% when no workload is present.

Screenshots

Environment (please complete the following information):

OS:

sh-4.2# cat /etc/os-release
  NAME="Amazon Linux"
  VERSION="2"
  ID="amzn"
  ID_LIKE="centos rhel fedora"
  VERSION_ID="2"
  PRETTY_NAME="Amazon Linux 2"
  CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"

Kernel: Linux ip-10-117-39-51.eu-central-1.compute.internal 5.10.198-187.748.amzn2.x86_64 #1 SMP Tue Oct 24 19:49:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Containerized?: Kubernetes

Dragonfly Version:

dragonfly v1.13.0-f39eac5bcaf7c8ffe5c433a0e8e15747391199d9
build time: 2023-12-04 15:59:48

Reproducible Code Snippet N/A

Additional context

We are using a EBS disk to write the dump file
EBS disk is a 1GB gp3 with 3000 iops available
Other workloads that use the persistent disk don't show this behaviour
AWS metrics show almost no sign of workload

chakaz commented 11 months ago

First, let me make sure I understand correctly this issue: you do not experience worse performance (like throughput / latency), but the process seems to be waiting for I/O more than other deployments. Is that correct?

Is WA% always high in this deployment, or is it just during writes to disk? (I see that you're saving RDB every 1 minute).

When you say "Other workloads that use the persistent disk don't show this behaviour" - what are the differences between this deployment and the others? Do they use different disks?

And finally, a few unrelated questions:

Why do you use --hz=5?
Similarly, why disable Dragonfly's snapshot format (via --df_snapshot_format=false)?
May I ask how do you use Dragonfly? With what load, for which purpose, etc?

Thanks!

romange commented 11 months ago

Duplicate of #2181 @fe-ax it's a kernel change on how it attributes CPU time in iouring API. Unfortunately, there is nothing much we can do about it but it does not affect anything. It's completely harmless, it's just that an idle CPU that is waiting for networking packet is attributed now as IOWAIT. iouring kernel folks decided at some point that it's better to attribute a cpu blocked on any I/O (even networking) as IOWAIT.

I am surprised that it appeared in kernel 5.10 but 5.10 is a lts kernel version, so maybe they backported this change to there. AFAIK, it first appeared in 6+ kernel versions.

I now googled again kernel discussions about this and learned that they decided to revert the decision because it has been confusing to many users. See here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/queue-6.4/io_uring-gate-iowait-schedule-on-having-pending-requests.patch?id=2b8c242ac869eae3d96b712fdb9940e9cd1e0d69

Also here mariadb/mysql folks complaining about this: https://bugzilla.kernel.org/show_bug.cgi?id=217699

romange commented 11 months ago

working as intended

tvijverb commented 11 months ago

First, let me make sure I understand correctly this issue: you do not experience worse performance (like throughput / latency), but the process seems to be waiting for I/O more than other deployments. Is that correct?

Is WA% always high in this deployment, or is it just during writes to disk? (I see that you're saving RDB every 1 minute).

When you say "Other workloads that use the persistent disk don't show this behaviour" - what are the differences between this deployment and the others? Do they use different disks?

And finally, a few unrelated questions:
* Why do you use `--hz=5`?

* Similarly, why disable Dragonfly's snapshot format (via `--df_snapshot_format=false`)?

* May I ask how do you use Dragonfly? With what load, for which purpose, etc?
Thanks!

The Dragonflydb pod is running on AWS spot instances, so saving the db state every minute is quite helpful for our purposes.
--hz=5 is used to reduce the cpu load, the default setting uses more than 10% CPU on our AWS instance.
--df_snapshot_format=false was needed in previous dragonflydb versions to save the db state to a Redis compatible *.rdb file.
The dragonfly instance is used as a simple job queue for Python (Celery).

sherif-fanous commented 10 months ago

I ran across this issue the past few days on my home lab k8s cluster where I started getting nagging NodeCPUHighUsage alerts from Prometheus.

After hours of triage (Because using all other available Linux tools didn't show any high CPU usage) I was able to determine that the alert was reporting CPU being io iowait and narrowed it down to Dragonfly.

In my case, I'm running a super trivial workload on my home lab so temporarily forced Dragonfly to use epoll using --force_epoll. Let me be clear that this works in my case where as I stated the workload is trivial and at least I'm no longer getting the Prometheus alerts.

fe-ax commented 10 months ago

@romange We're using kernel-5.10.198-187.748.amzn2.

If this is the patch intended to resolve the issue. It doesn't fix the issue.

Here we can see the patch is already implemented in the running version on our host.

crishoj commented 6 months ago

Also observing ~100% IOWAIT on Linux 6.5.0.

tvijverb commented 6 months ago

@crishoj Parent issue on liburing mentions it will be fixed in kernel 6.10. No idea if the patch will be backported. https://github.com/axboe/liburing/issues/943

dragonflydb / dragonfly

High WA% without workload #2270