with cache full (but clean), seeing very high number of pass-through requests

aneagoe commented 1 month ago

Question

On a dev ceph cluster, I'm using opencas in front of the HDDs. Went through some upgrades (centos8 to centos9, upgrade to latest opencas 24.09.0.0900.release) and noticed that the rate if serviced requests is extremely low (ie below 10%). This seems to happen when the cache gets full, despite the fact that the cache is clean. With the current settings in place, I cannot explain why the requests are not serviced and go directly to the core. The issue is very similar to https://github.com/Open-CAS/open-cas-linux/issues/1397 which for some reason became stale.

Note that in the output below (from one of the servers), the seq-cutoff has been configured to never (set to full on the other nodes) in an attempt to see if it would change something. It had no effect unfortunately.

Motivation

I need to improve performance and then replicate to our prod cluster. Performance is now degraded as most operations skip the cache entirely.

Your Environment

OpenCAS version (commit hash or tag): 24.09.0.0900.release
Operating System: CentOS 9 stream
Kernel version: 5.14.0-514.el9
Cache device type (NAND/Optane/other): QEMU nvme controller
Core device type (HDD/SSD/other): QEMU harddisk
Cache configuration:
- Cache mode: (default: wt) wb
- Cache line size: (default: 4) 16
- Promotion policy: (default: always) always
- Cleaning policy: (default: alru) alru
- Sequential cutoff policy: (default: full) never

### casadm -V
╔═════════════════════════╤══════════════════════╗
║ Name                    │       Version        ║
╠═════════════════════════╪══════════════════════╣
║ CAS Cache Kernel Module │ 24.09.0.0900.release ║
║ CAS CLI Utility         │ 24.09.0.0900.release ║
╚═════════════════════════╧══════════════════════╝

### casadm -L
type    id   disk         status    write policy   device
cache   1    /dev/dm-19   Running   wb             -
├core   1    /dev/sdc     Active    -              /dev/cas1-1
├core   3    /dev/sdb     Active    -              /dev/cas1-3
└core   5    /dev/sde     Active    -              /dev/cas1-5
cache   2    /dev/dm-21   Running   wb             -
├core   2    /dev/sda     Active    -              /dev/cas2-2
├core   4    /dev/sdd     Active    -              /dev/cas2-4
└core   6    /dev/sdf     Active    -              /dev/cas2-6

### grep -r . /sys/module/cas_cache/parameters/
/sys/module/cas_cache/parameters/unaligned_io:1
/sys/module/cas_cache/parameters/use_io_scheduler:1
/sys/module/cas_cache/parameters/max_writeback_queue_size:65536
/sys/module/cas_cache/parameters/writeback_queue_unblock_size:60000
/sys/module/cas_cache/parameters/seq_cut_off_mb:1

### casadm -G --name seq-cutoff -i 1 -j 1
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ never ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝

### casadm -G --name seq-cutoff -i 1 -j 3
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ never ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝

### casadm -G --name seq-cutoff -i 1 -j 5
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ never ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝

### casadm -G --name seq-cutoff -i 2 -j 2
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ never ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝

### casadm -G --name seq-cutoff -i 2 -j 4
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ never ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝

### casadm -G --name seq-cutoff -i 2 -j 6
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ never ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝

### casadm -C --list -i 1
╔═════════════╤═════════════════════════════╤═══════════════════╤════════════╗
║ IO class id │        IO class name        │ Eviction priority │ Allocation ║
╠═════════════╪═════════════════════════════╪═══════════════════╪════════════╣
║           0 │ unclassified                │               100 │       0.00 ║
║           1 │ direct&done                 │                50 │       1.00 ║
║          10 │ request_size:le:4096&done   │                 1 │       1.00 ║
║          11 │ request_size:le:8192&done   │                 2 │       1.00 ║
║          12 │ request_size:le:16384&done  │                 3 │       1.00 ║
║          13 │ request_size:le:32768&done  │                 4 │       1.00 ║
║          14 │ request_size:le:65536&done  │                 5 │       1.00 ║
║          15 │ request_size:le:131072&done │                 6 │       1.00 ║
║          16 │ request_size:le:262144&done │                 7 │       1.00 ║
║          17 │ request_size:lt:524288&done │                 8 │       1.00 ║
║          18 │ request_size:ge:524288&done │                 9 │       1.00 ║
╚═════════════╧═════════════════════════════╧═══════════════════╧════════════╝

### casadm -C --list -i 2
╔═════════════╤═════════════════════════════╤═══════════════════╤════════════╗
║ IO class id │        IO class name        │ Eviction priority │ Allocation ║
╠═════════════╪═════════════════════════════╪═══════════════════╪════════════╣
║           0 │ unclassified                │               100 │       0.00 ║
║           1 │ direct&done                 │                50 │       1.00 ║
║          10 │ request_size:le:4096&done   │                 1 │       1.00 ║
║          11 │ request_size:le:8192&done   │                 2 │       1.00 ║
║          12 │ request_size:le:16384&done  │                 3 │       1.00 ║
║          13 │ request_size:le:32768&done  │                 4 │       1.00 ║
║          14 │ request_size:le:65536&done  │                 5 │       1.00 ║
║          15 │ request_size:le:131072&done │                 6 │       1.00 ║
║          16 │ request_size:le:262144&done │                 7 │       1.00 ║
║          17 │ request_size:lt:524288&done │                 8 │       1.00 ║
║          18 │ request_size:ge:524288&done │                 9 │       1.00 ║
╚═════════════╧═════════════════════════════╧═══════════════════╧════════════╝

### casadm -P -i 1
Cache Id                  1
Cache Size                11318336 [4KiB Blocks] / 43.18 [GiB]
Cache Device              /dev/dm-19
Exported Object           -
Core Devices              3
Inactive Core Devices     0
Write Policy              wb
Cleaning Policy           alru
Promotion Policy          always
Cache line size           16 [KiB]
Metadata Memory Footprint 161.1 [MiB]
Dirty for                 3 [s] / 3 [s]
Status                    Running

╔══════════════════╤══════════╤═══════╤═════════════╗
║ Usage statistics │  Count   │   %   │   Units     ║
╠══════════════════╪══════════╪═══════╪═════════════╣
║ Occupancy        │ 11318336 │ 100.0 │ 4KiB Blocks ║
║ Free             │        0 │   0.0 │ 4KiB Blocks ║
║ Clean            │ 11318332 │ 100.0 │ 4KiB Blocks ║
║ Dirty            │        4 │   0.0 │ 4KiB Blocks ║
╚══════════════════╧══════════╧═══════╧═════════════╝

╔══════════════════════╤═════════╤═══════╤══════════╗
║ Request statistics   │  Count  │   %   │ Units    ║
╠══════════════════════╪═════════╪═══════╪══════════╣
║ Read hits            │   16052 │   1.1 │ Requests ║
║ Read partial misses  │      15 │   0.0 │ Requests ║
║ Read full misses     │     230 │   0.0 │ Requests ║
║ Read total           │   16297 │   1.1 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Write hits           │  103532 │   7.0 │ Requests ║
║ Write partial misses │     188 │   0.0 │ Requests ║
║ Write full misses    │   10706 │   0.7 │ Requests ║
║ Write total          │  114426 │   7.8 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Pass-Through reads   │  123804 │   8.4 │ Requests ║
║ Pass-Through writes  │ 1216150 │  82.7 │ Requests ║
║ Serviced requests    │  130723 │   8.9 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Total requests       │ 1470677 │ 100.0 │ Requests ║
╚══════════════════════╧═════════╧═══════╧══════════╝

╔══════════════════════════════════╤══════════╤═══════╤═════════════╗
║ Block statistics                 │  Count   │   %   │   Units     ║
╠══════════════════════════════════╪══════════╪═══════╪═════════════╣
║ Reads from core(s)               │  1765024 │  15.4 │ 4KiB Blocks ║
║ Writes to core(s)                │  9727143 │  84.6 │ 4KiB Blocks ║
║ Total to/from core(s)            │ 11492167 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼──────────┼───────┼─────────────╢
║ Reads from cache                 │   658563 │  73.0 │ 4KiB Blocks ║
║ Writes to cache                  │   243291 │  27.0 │ 4KiB Blocks ║
║ Total to/from cache              │   901854 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼──────────┼───────┼─────────────╢
║ Reads from exported object(s)    │  1872639 │  16.4 │ 4KiB Blocks ║
║ Writes to exported object(s)     │  9573226 │  83.6 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 11445865 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧══════════╧═══════╧═════════════╝

╔════════════════════╤═══════╤═════╤══════════╗
║ Error statistics   │ Count │  %  │ Units    ║
╠════════════════════╪═══════╪═════╪══════════╣
║ Cache read errors  │     0 │ 0.0 │ Requests ║
║ Cache write errors │     0 │ 0.0 │ Requests ║
║ Cache total errors │     0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Core read errors   │     0 │ 0.0 │ Requests ║
║ Core write errors  │     0 │ 0.0 │ Requests ║
║ Core total errors  │     0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Total errors       │     0 │ 0.0 │ Requests ║
╚════════════════════╧═══════╧═════╧══════════╝

### casadm -P -i 2
Cache Id                  2
Cache Size                11318336 [4KiB Blocks] / 43.18 [GiB]
Cache Device              /dev/dm-21
Exported Object           -
Core Devices              3
Inactive Core Devices     0
Write Policy              wb
Cleaning Policy           alru
Promotion Policy          always
Cache line size           16 [KiB]
Metadata Memory Footprint 161.1 [MiB]
Dirty for                 0 [s] / Cache clean
Status                    Running

╔══════════════════╤══════════╤═══════╤═════════════╗
║ Usage statistics │  Count   │   %   │   Units     ║
╠══════════════════╪══════════╪═══════╪═════════════╣
║ Occupancy        │ 11318336 │ 100.0 │ 4KiB Blocks ║
║ Free             │        0 │   0.0 │ 4KiB Blocks ║
║ Clean            │ 11318328 │ 100.0 │ 4KiB Blocks ║
║ Dirty            │        8 │   0.0 │ 4KiB Blocks ║
╚══════════════════╧══════════╧═══════╧═════════════╝

╔══════════════════════╤═════════╤═══════╤══════════╗
║ Request statistics   │  Count  │   %   │ Units    ║
╠══════════════════════╪═════════╪═══════╪══════════╣
║ Read hits            │   28597 │   1.6 │ Requests ║
║ Read partial misses  │      16 │   0.0 │ Requests ║
║ Read full misses     │    2376 │   0.1 │ Requests ║
║ Read total           │   30989 │   1.7 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Write hits           │  126720 │   7.1 │ Requests ║
║ Write partial misses │     160 │   0.0 │ Requests ║
║ Write full misses    │   13263 │   0.7 │ Requests ║
║ Write total          │  140143 │   7.8 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Pass-Through reads   │  201557 │  11.3 │ Requests ║
║ Pass-Through writes  │ 1416578 │  79.2 │ Requests ║
║ Serviced requests    │  171132 │   9.6 │ Requests ║
╟──────────────────────┼─────────┼───────┼──────────╢
║ Total requests       │ 1789267 │ 100.0 │ Requests ║
╚══════════════════════╧═════════╧═══════╧══════════╝

╔══════════════════════════════════╤══════════╤═══════╤═════════════╗
║ Block statistics                 │  Count   │   %   │   Units     ║
╠══════════════════════════════════╪══════════╪═══════╪═════════════╣
║ Reads from core(s)               │  2741719 │  21.6 │ 4KiB Blocks ║
║ Writes to core(s)                │  9946936 │  78.4 │ 4KiB Blocks ║
║ Total to/from core(s)            │ 12688655 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼──────────┼───────┼─────────────╢
║ Reads from cache                 │   820677 │  78.0 │ 4KiB Blocks ║
║ Writes to cache                  │   231954 │  22.0 │ 4KiB Blocks ║
║ Total to/from cache              │  1052631 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼──────────┼───────┼─────────────╢
║ Reads from exported object(s)    │  2947928 │  23.2 │ 4KiB Blocks ║
║ Writes to exported object(s)     │  9746034 │  76.8 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 12693962 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧══════════╧═══════╧═════════════╝

╔════════════════════╤═══════╤═════╤══════════╗
║ Error statistics   │ Count │  %  │ Units    ║
╠════════════════════╪═══════╪═════╪══════════╣
║ Cache read errors  │     0 │ 0.0 │ Requests ║
║ Cache write errors │     0 │ 0.0 │ Requests ║
║ Cache total errors │     0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Core read errors   │     0 │ 0.0 │ Requests ║
║ Core write errors  │     0 │ 0.0 │ Requests ║
║ Core total errors  │     0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Total errors       │     0 │ 0.0 │ Requests ║
╚════════════════════╧═══════╧═════╧══════════╝

robertbaldyga commented 1 month ago

Hi @aneagoe! Could please you send output of casadm -P -i 1 -d?

aneagoe commented 1 month ago

I've switched all nodes to a single ioclass (unclassified) and the serviced request ratio went to 95%+. Disabling sequential cutoff achieved 100% (just experimental, doesn't make sense to leave on). This is consistent across all nodes in the dev cluster and I've done the same to one node in prod cluster with same results. So there seems to be something off either with my settings of the ioclass or with priority based evictions. Althought I couldn't run the precise command you sent, I dug up a few commands from when I was troubleshooting:

[root@acosd3 ~]# casadm -P -i 1 -d 0
IO class ID       0
IO class name     unclassified
Eviction priority 100
Max size          0%

╔══════════════════╤═══════╤═════╤═════════════╗
║ Usage statistics │ Count │  %  │   Units     ║
╠══════════════════╪═══════╪═════╪═════════════╣
║ Occupancy        │     0 │ 0.0 │ 4KiB Blocks ║
║ Clean            │     0 │ 0.0 │ 4KiB Blocks ║
║ Dirty            │     0 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧═══════╧═════╧═════════════╝

╔══════════════════════╤═══════╤═════╤══════════╗
║ Request statistics   │ Count │  %  │ Units    ║
╠══════════════════════╪═══════╪═════╪══════════╣
║ Read hits            │     0 │ 0.0 │ Requests ║
║ Read partial misses  │     0 │ 0.0 │ Requests ║
║ Read full misses     │     0 │ 0.0 │ Requests ║
║ Read total           │     0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼─────┼──────────╢
║ Write hits           │     0 │ 0.0 │ Requests ║
║ Write partial misses │     0 │ 0.0 │ Requests ║
║ Write full misses    │     0 │ 0.0 │ Requests ║
║ Write total          │     0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼─────┼──────────╢
║ Pass-Through reads   │     0 │ 0.0 │ Requests ║
║ Pass-Through writes  │     0 │ 0.0 │ Requests ║
║ Serviced requests    │     0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼─────┼──────────╢
║ Total requests       │     0 │ 0.0 │ Requests ║
╚══════════════════════╧═══════╧═════╧══════════╝

╔══════════════════════════════════╤═══════╤═════╤═════════════╗
║ Block statistics                 │ Count │  %  │   Units     ║
╠══════════════════════════════════╪═══════╪═════╪═════════════╣
║ Reads from core(s)               │     0 │ 0.0 │ 4KiB Blocks ║
║ Writes to core(s)                │     0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from core(s)            │     0 │ 0.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼─────┼─────────────╢
║ Reads from cache                 │     0 │ 0.0 │ 4KiB Blocks ║
║ Writes to cache                  │     0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from cache              │     0 │ 0.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼─────┼─────────────╢
║ Reads from exported object(s)    │     0 │ 0.0 │ 4KiB Blocks ║
║ Writes to exported object(s)     │     0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │     0 │ 0.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 1
IO class ID       1
IO class name     direct&done
Eviction priority 50
Max size          100%

╔══════════════════╤═══════╤═══════╤═════════════╗
║ Usage statistics │ Count │   %   │   Units     ║
╠══════════════════╪═══════╪═══════╪═════════════╣
║ Occupancy        │   748 │   0.0 │ 4KiB Blocks ║
║ Clean            │   748 │ 100.0 │ 4KiB Blocks ║
║ Dirty            │     0 │   0.0 │ 4KiB Blocks ║
╚══════════════════╧═══════╧═══════╧═════════════╝

╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics   │ Count │   %   │ Units    ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits            │     5 │ 100.0 │ Requests ║
║ Read partial misses  │     0 │   0.0 │ Requests ║
║ Read full misses     │     0 │   0.0 │ Requests ║
║ Read total           │     5 │ 100.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits           │     0 │   0.0 │ Requests ║
║ Write partial misses │     0 │   0.0 │ Requests ║
║ Write full misses    │     0 │   0.0 │ Requests ║
║ Write total          │     0 │   0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads   │     0 │   0.0 │ Requests ║
║ Pass-Through writes  │     0 │   0.0 │ Requests ║
║ Serviced requests    │     5 │ 100.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests       │     5 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝

╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics                 │ Count │   %   │   Units     ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s)               │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to core(s)                │     0 │   0.0 │ 4KiB Blocks ║
║ Total to/from core(s)            │     0 │   0.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache                 │     5 │ 100.0 │ 4KiB Blocks ║
║ Writes to cache                  │     0 │   0.0 │ 4KiB Blocks ║
║ Total to/from cache              │     5 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s)    │     5 │ 100.0 │ 4KiB Blocks ║
║ Writes to exported object(s)     │     0 │   0.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │     5 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 10
IO class ID       10
IO class name     request_size:le:4096&done
Eviction priority 1
Max size          100%

╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count  │   %   │   Units     ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy        │ 407540 │   3.6 │ 4KiB Blocks ║
║ Clean            │ 407536 │ 100.0 │ 4KiB Blocks ║
║ Dirty            │      4 │   0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝

╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics   │ Count │   %   │ Units    ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits            │     0 │   0.0 │ Requests ║
║ Read partial misses  │     0 │   0.0 │ Requests ║
║ Read full misses     │     0 │   0.0 │ Requests ║
║ Read total           │     0 │   0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits           │    42 │  30.9 │ Requests ║
║ Write partial misses │     0 │   0.0 │ Requests ║
║ Write full misses    │     8 │   5.9 │ Requests ║
║ Write total          │    50 │  36.8 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads   │     0 │   0.0 │ Requests ║
║ Pass-Through writes  │    86 │  63.2 │ Requests ║
║ Serviced requests    │    50 │  36.8 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests       │   136 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝

╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics                 │ Count │   %   │   Units     ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s)               │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to core(s)                │   179 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s)            │   179 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache                 │   188 │  79.0 │ 4KiB Blocks ║
║ Writes to cache                  │    50 │  21.0 │ 4KiB Blocks ║
║ Total to/from cache              │   238 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s)    │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to exported object(s)     │   136 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │   136 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 11
IO class ID       11
IO class name     request_size:le:8192&done
Eviction priority 2
Max size          100%

╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count  │   %   │   Units     ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy        │ 190084 │   1.7 │ 4KiB Blocks ║
║ Clean            │ 190072 │ 100.0 │ 4KiB Blocks ║
║ Dirty            │     12 │   0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝

╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics   │ Count │   %   │ Units    ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits            │     0 │   0.0 │ Requests ║
║ Read partial misses  │     0 │   0.0 │ Requests ║
║ Read full misses     │     0 │   0.0 │ Requests ║
║ Read total           │     0 │   0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits           │    36 │  33.6 │ Requests ║
║ Write partial misses │     0 │   0.0 │ Requests ║
║ Write full misses    │    10 │   9.3 │ Requests ║
║ Write total          │    46 │  43.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads   │     0 │   0.0 │ Requests ║
║ Pass-Through writes  │    61 │  57.0 │ Requests ║
║ Serviced requests    │    46 │  43.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests       │   107 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝

╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics                 │ Count │   %   │   Units     ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s)               │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to core(s)                │   257 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s)            │   257 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache                 │   216 │  70.1 │ 4KiB Blocks ║
║ Writes to cache                  │    92 │  29.9 │ 4KiB Blocks ║
║ Total to/from cache              │   308 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s)    │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to exported object(s)     │   214 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │   214 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 12
IO class ID       12
IO class name     request_size:le:16384&done
Eviction priority 3
Max size          100%

╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count  │   %   │   Units     ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy        │ 365160 │   3.2 │ 4KiB Blocks ║
║ Clean            │ 365160 │ 100.0 │ 4KiB Blocks ║
║ Dirty            │      0 │   0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝

╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics   │ Count │   %   │ Units    ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits            │     0 │   0.0 │ Requests ║
║ Read partial misses  │     0 │   0.0 │ Requests ║
║ Read full misses     │     0 │   0.0 │ Requests ║
║ Read total           │     0 │   0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits           │     7 │   7.0 │ Requests ║
║ Write partial misses │     2 │   2.0 │ Requests ║
║ Write full misses    │     3 │   3.0 │ Requests ║
║ Write total          │    12 │  12.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads   │     0 │   0.0 │ Requests ║
║ Pass-Through writes  │    88 │  88.0 │ Requests ║
║ Serviced requests    │    12 │  12.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests       │   100 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝

╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics                 │ Count │   %   │   Units     ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s)               │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to core(s)                │   391 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s)            │   391 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache                 │    96 │  68.6 │ 4KiB Blocks ║
║ Writes to cache                  │    44 │  31.4 │ 4KiB Blocks ║
║ Total to/from cache              │   140 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s)    │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to exported object(s)     │   366 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │   366 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 13
IO class ID       13
IO class name     request_size:le:32768&done
Eviction priority 4
Max size          100%

╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count  │   %   │   Units     ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy        │ 729812 │   6.4 │ 4KiB Blocks ║
║ Clean            │ 729812 │ 100.0 │ 4KiB Blocks ║
║ Dirty            │      0 │   0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝

╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics   │ Count │   %   │ Units    ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits            │     0 │   0.0 │ Requests ║
║ Read partial misses  │     0 │   0.0 │ Requests ║
║ Read full misses     │     0 │   0.0 │ Requests ║
║ Read total           │     0 │   0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits           │    15 │  13.2 │ Requests ║
║ Write partial misses │     1 │   0.9 │ Requests ║
║ Write full misses    │     4 │   3.5 │ Requests ║
║ Write total          │    20 │  17.5 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads   │     0 │   0.0 │ Requests ║
║ Pass-Through writes  │    94 │  82.5 │ Requests ║
║ Serviced requests    │    20 │  17.5 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests       │   114 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝

╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics                 │ Count │   %   │   Units     ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s)               │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to core(s)                │   770 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s)            │   770 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache                 │   176 │  58.3 │ 4KiB Blocks ║
║ Writes to cache                  │   126 │  41.7 │ 4KiB Blocks ║
║ Total to/from cache              │   302 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s)    │     0 │   0.0 │ 4KiB Blocks ║
║ Writes to exported object(s)     │   740 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │   740 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝

mmichal10 commented 1 month ago

Hi @aneagoe, thanks for the quick response. Could you please rerun your workload on a new cache instance with your original ioclass config and cleaning policy set to NOP (casadm -X -n cleaning -p nop) and share full stats of all configured ioclasses?

aneagoe commented 1 month ago

@mmichal10 I can adjust on one of the nodes as recommended. But can you please clarify the scope of this experiment? In my understanding, if we switch to nop we're basically not cleaning the cache, so once it gets full, we can expect it to switch to write-through, or? I will stop the OSDs, stop and remove existing cas devices and re-create with the settings you mentioned, then report back.

mmichal10 commented 1 month ago

@aneagoe once the cache is 100% dirty the write requests will be redirected into pass-through and flushing of a small portion of dirty blocks will be triggered. We want to make sure that cache gets 100% dirty before the writes are pass-through'ed and we want to see occupancy and how many writes got pass-through'ed for each ioclass

aneagoe commented 1 month ago

I've attached snapshots taken at 5 minute intervals. Triggered some pool resizes to get some more traffic going and it seems the cache never got to 100% dirty (but some 99.8%) before the flushing happens. Perhaps not very easy to go through, if you need different format/layout, I'm happy to re-run this experiment and accommodate, just let me know. cas_stats.out.gz

jfckm commented 1 month ago

Hi @aneagoe, just a quick question for you: I'm trying to reproduce what you're seeing on our side and I am seeing some passthrough traffic, but not as much as in your stats. Can you share a bit more info on your testing setup? For me it's ((HDD->single SSD CAS cache)12osds)3nodes and on top of that I create 60 RBD images, fill the caches with large writes and run some read-intensive workload or rwmix (IO done by fio with librbd). Maybe if we'd align a bit more it would help us see exactly the same issues you're seeing.

aneagoe commented 1 month ago

Hey @jfckm. I have a dev cluster where I tend to test any changes before I apply to prod cluster. The dev cluster is built on top of PROXMOX with its own CEPH cluster. There are 3 OSD VMs, each with 6 x 1TB volumes and also 2 x NVME drive to be partly used by opencas (shared with OS, we assign a 43G LV to be used by opencas). From the same NVME drive, we also carve two LVs to be used as direct NVMe OSDs. On the NVMe backed storage we place things like rgw index bucket and cephfs_metadata pools. Some lsblk below to show the overall setup:

[root@acosd1 ~]# lsblk
NAME                                         MAJ:MIN  RM   SIZE RO TYPE  MOUNTPOINTS
sda                                            8:0     0  1000G  0 disk
└─cas2-2                                     252:1792  0  1000G  0 disk
  └─ceph_data_0102-ceph_0102_block           253:23    0  1000G  0 lvm
    └─VU36Ok-riz1-Tgs2-KxxC-UpIG-8Vxf-Fs1WrQ 253:29    0  1000G  0 crypt
sdb                                            8:16    0  1000G  0 disk
└─cas1-1                                     252:1536  0  1000G  0 disk
  └─ceph_data_0101-ceph_0101_block           253:22    0  1000G  0 lvm
    └─xiFIJE-Fkxf-8j2a-qhG3-NirX-bhdK-HGx5FQ 253:28    0  1000G  0 crypt
sdc                                            8:32    0  1000G  0 disk
└─cas1-5                                     252:2560  0  1000G  0 disk
  └─ceph_data_0105-ceph_0105_block           253:26    0  1000G  0 lvm
    └─tcWdld-26OF-zNli-Iehs-Zk37-fUXZ-2g1YR3 253:32    0  1000G  0 crypt
sdd                                            8:48    0  1000G  0 disk
└─cas1-3                                     252:2048  0  1000G  0 disk
  └─ceph_data_0103-ceph_0103_block           253:24    0  1000G  0 lvm
    └─EPHkdC-ucla-hoMx-qMP6-0Uno-r82O-VOc00n 253:30    0  1000G  0 crypt
sde                                            8:64    0  1000G  0 disk
└─cas2-4                                     252:2304  0  1000G  0 disk
  └─ceph_data_0104-ceph_0104_block           253:25    0  1000G  0 lvm
    └─xZWLVv-XCdf-j3kt-uCVE-sZB1-2q5B-B8NYRQ 253:31    0  1000G  0 crypt
sdf                                            8:80    0  1000G  0 disk
└─cas2-6                                     252:2816  0  1000G  0 disk
  └─ceph_data_0106-ceph_0106_block           253:27    0  1000G  0 lvm
    └─3B1DAI-EOX6-4gNe-DsPM-dmiF-SiTL-sizaLC 253:33    0  1000G  0 crypt
nvme0n1                                      259:0     0   265G  0 disk
├─nvme0n1p1                                  259:1     0  39.1G  0 part
│ └─md1                                        9:1     0    39G  0 raid1
│   ├─vg_root-root                           253:0     0     6G  0 lvm   /
│   ├─vg_root-home                           253:1     0     4G  0 lvm   /home
│   ├─vg_root-var                            253:2     0    10G  0 lvm   /var
│   └─vg_root-tmp                            253:3     0     5G  0 lvm   /tmp
├─nvme0n1p2                                  259:2     0     1G  0 part
│ └─md0                                        9:0     0  1022M  0 raid1 /boot
├─nvme0n1p3                                  259:3     0   200M  0 part
│ └─md127                                      9:127   0 199.9M  0 raid1 /boot/efi
└─nvme0n1p4                                  259:4     0 224.7G  0 part
  ├─vg_ceph_01_misc_1-ceph_0101_block.db     253:13    0  31.1G  0 lvm
  ├─vg_ceph_01_misc_1-ceph_0101_block.wal    253:14    0     8G  0 lvm
  ├─vg_ceph_01_misc_1-ceph_0103_block.db     253:15    0  31.1G  0 lvm
  ├─vg_ceph_01_misc_1-ceph_0103_block.wal    253:16    0     8G  0 lvm
  ├─vg_ceph_01_misc_1-ceph_0105_block.db     253:17    0  31.1G  0 lvm
  ├─vg_ceph_01_misc_1-ceph_0105_block.wal    253:18    0     8G  0 lvm
  ├─vg_ceph_01_misc_1-ceph_0130_block        253:19    0    32G  0 lvm
  ├─vg_ceph_01_misc_1-ceph_0132_block        253:20    0    32G  0 lvm
  └─vg_ceph_01_misc_1-ceph_01_cas1           253:21    0  43.3G  0 lvm
nvme1n1                                      259:5     0   265G  0 disk
├─nvme1n1p1                                  259:6     0  39.1G  0 part
│ └─md1                                        9:1     0    39G  0 raid1
│   ├─vg_root-root                           253:0     0     6G  0 lvm   /
│   ├─vg_root-home                           253:1     0     4G  0 lvm   /home
│   ├─vg_root-var                            253:2     0    10G  0 lvm   /var
│   └─vg_root-tmp                            253:3     0     5G  0 lvm   /tmp
├─nvme1n1p2                                  259:7     0     1G  0 part
│ └─md0                                        9:0     0  1022M  0 raid1 /boot
├─nvme1n1p3                                  259:8     0   200M  0 part
│ └─md127                                      9:127   0 199.9M  0 raid1 /boot/efi
└─nvme1n1p4                                  259:9     0 224.7G  0 part
  ├─vg_ceph_01_misc_2-ceph_0102_block.db     253:4     0  31.1G  0 lvm
  ├─vg_ceph_01_misc_2-ceph_0102_block.wal    253:5     0     8G  0 lvm
  ├─vg_ceph_01_misc_2-ceph_0104_block.db     253:6     0  31.1G  0 lvm
  ├─vg_ceph_01_misc_2-ceph_0104_block.wal    253:7     0     8G  0 lvm
  ├─vg_ceph_01_misc_2-ceph_0106_block.db     253:8     0  31.1G  0 lvm
  ├─vg_ceph_01_misc_2-ceph_0106_block.wal    253:9     0     8G  0 lvm
  ├─vg_ceph_01_misc_2-ceph_0131_block        253:10    0    32G  0 lvm
  ├─vg_ceph_01_misc_2-ceph_0133_block        253:11    0    32G  0 lvm
  └─vg_ceph_01_misc_2-ceph_01_cas2           253:12    0  43.3G  0 lvm
[root@acosd1 ~]# casadm -L
type    id   disk         status    write policy   device
cache   1    /dev/dm-21   Running   wb             -
├core   1    /dev/sdb     Active    -              /dev/cas1-1
├core   3    /dev/sdd     Active    -              /dev/cas1-3
└core   5    /dev/sdc     Active    -              /dev/cas1-5
cache   2    /dev/dm-12   Running   wb             -
├core   2    /dev/sda     Active    -              /dev/cas2-2
├core   4    /dev/sde     Active    -              /dev/cas2-4
└core   6    /dev/sdf     Active    -              /dev/cas2-6

On top of it, we configure RBD (consumed by a dev k8s cluster), CEPHFS (some dev workloads) and object storage (also used by some dev workloads). The cluster is barely used as the workloads are minimal.

The behaviour was observed naturally (and also shows up on our prod cluster) and during last test I had to trigger some pool reorg (bumped up/down number of PGs for some pools).

OSD df below:

ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META      AVAIL    %USE   VAR   PGS  STATUS
101    hdd  0.97699   1.00000  1.0 TiB  109 GiB   78 GiB    6 KiB   851 MiB  922 GiB  10.56  0.94   62      up
102    hdd  0.97699   1.00000  1.0 TiB  116 GiB   85 GiB   10 KiB   1.4 GiB  915 GiB  11.23  1.00   66      up
103    hdd  0.97699   1.00000  1.0 TiB  111 GiB   80 GiB    5 KiB   2.2 GiB  920 GiB  10.75  0.96   60      up
104    hdd  0.97699   1.00000  1.0 TiB  120 GiB   89 GiB    8 KiB   1.9 GiB  911 GiB  11.64  1.04   71      up
105    hdd  0.97699   1.00000  1.0 TiB  120 GiB   89 GiB    8 KiB   1.9 GiB  911 GiB  11.66  1.04   63      up
106    hdd  0.97699   1.00000  1.0 TiB  124 GiB   93 GiB   15 KiB   1.3 GiB  907 GiB  12.05  1.07   65      up
201    hdd  0.97699   1.00000  1.0 TiB  109 GiB   77 GiB    6 KiB   886 MiB  923 GiB  10.53  0.94   61      up
202    hdd  0.97699   1.00000  1.0 TiB  125 GiB   93 GiB   15 KiB   893 MiB  907 GiB  12.08  1.07   65      up
203    hdd  0.97699   1.00000  1.0 TiB  114 GiB   83 GiB    8 KiB   1.7 GiB  917 GiB  11.07  0.99   66      up
204    hdd  0.97699   1.00000  1.0 TiB  110 GiB   79 GiB    7 KiB  1023 MiB  921 GiB  10.70  0.95   57      up
205    hdd  0.97699   1.00000  1.0 TiB  132 GiB  101 GiB    7 KiB   1.7 GiB  899 GiB  12.81  1.14   76      up
206    hdd  0.97699   1.00000  1.0 TiB  115 GiB   83 GiB   15 KiB   1.7 GiB  917 GiB  11.12  0.99   60      up
301    hdd  0.97699   1.00000  1.0 TiB  116 GiB   85 GiB    9 KiB   1.4 GiB  915 GiB  11.27  1.00   63      up
302    hdd  0.97699   1.00000  1.0 TiB  116 GiB   85 GiB   12 KiB   1.2 GiB  915 GiB  11.22  1.00   62      up
303    hdd  0.97699   1.00000  1.0 TiB  109 GiB   78 GiB    7 KiB   1.2 GiB  922 GiB  10.60  0.94   58      up
304    hdd  0.97699   1.00000  1.0 TiB  116 GiB   85 GiB    7 KiB   1.0 GiB  915 GiB  11.25  1.00   64      up
305    hdd  0.97699   1.00000  1.0 TiB  120 GiB   89 GiB    8 KiB   876 MiB  911 GiB  11.67  1.04   70      up
306    hdd  0.97699   1.00000  1.0 TiB  122 GiB   90 GiB   17 KiB   875 MiB  909 GiB  11.79  1.05   65      up
130   nvme  0.03099   1.00000   32 GiB  2.9 GiB  887 MiB   39 MiB   2.0 GiB   29 GiB   8.96  0.80   33      up
131   nvme  0.03099   1.00000   32 GiB  1.2 GiB  874 MiB   31 MiB   322 MiB   31 GiB   3.75  0.33   14      up
132   nvme  0.03099   1.00000   32 GiB  2.0 GiB  875 MiB   43 MiB   1.1 GiB   30 GiB   6.35  0.56   20      up
133   nvme  0.03099   1.00000   32 GiB  3.1 GiB  875 MiB   18 MiB   2.2 GiB   29 GiB   9.70  0.86   13      up
230   nvme  0.03099   1.00000   32 GiB  2.8 GiB  879 MiB   39 MiB   2.0 GiB   29 GiB   8.90  0.79   25      up
231   nvme  0.03099   1.00000   32 GiB  1.2 GiB  874 MiB   35 MiB   322 MiB   31 GiB   3.76  0.33   19      up
232   nvme  0.03099   1.00000   32 GiB  2.3 GiB  875 MiB   14 MiB   1.4 GiB   30 GiB   7.13  0.63   14      up
233   nvme  0.03099   1.00000   32 GiB  2.3 GiB  882 MiB   43 MiB   1.4 GiB   30 GiB   7.24  0.64   22      up
330   nvme  0.03099   1.00000   32 GiB  1.2 GiB  885 MiB   45 MiB   327 MiB   31 GiB   3.84  0.34   24      up
331   nvme  0.03099   1.00000   32 GiB  2.8 GiB  875 MiB   27 MiB   2.0 GiB   29 GiB   8.89  0.79   16      up
332   nvme  0.03099   1.00000   32 GiB  1.7 GiB  875 MiB   41 MiB   872 MiB   30 GiB   5.45  0.49   18      up
333   nvme  0.03099   1.00000   32 GiB  2.0 GiB  876 MiB   19 MiB   1.1 GiB   30 GiB   6.26  0.56   22      up
                        TOTAL   19 TiB  2.1 TiB  1.5 TiB  395 MiB    39 GiB   16 TiB  11.24
MIN/MAX VAR: 0.33/1.14  STDDEV: 3.20

I can definitely run a more aligned test by spinning up workloads in k8s pods that will consume RBDs. I can take similar approach of filling the cache and then running a more mixed workload of both read and write.

What I'm really trying to understand here is why am I seeing this large amount of PT writes when the cache is literally clean and it should just evict.

jfckm commented 1 month ago

Thanks! This will be helpful for sure. For now the data you've provided should suffice, no need for additional experiments - it's me who wanted to reproduce your environment as closely as I can. We're taking a two-pronged approach here: @robertbaldyga and @mmichal10 are trying to come up with something with static code analysis and in the meantime I'm trying to reproduce it on live system. We will keep you posted.

aneagoe commented 1 month ago

Thanks a lot for you help. I'm now working on aligning all PROD nodes to the latest opencas (we observe the same behaviour on PROD cluster across various master releases). On the PROD cluster I had to use master branch and rebuild nodes one by one as we've switched to EL9 (CentOS stream9). We start to run into performance issues only when the cache gets full and unfortunately there's no way to force evict (I need to basically stop OSDs, stop and remove cache then force init the cache).

mmichal10 commented 3 weeks ago

Hi @aneagoe,

You're experiencing the issue on Centos9 with OpenCAS v22.6.3 and v24.09, is that correct? Did the problem appear on Centos8?

We tried to reproduce the problem but unfortunately the only PT requests we observed were due to cache being 100% dirty. I've prepared a script that collects cache stats in CVS format. Could you please run it for an hour or two on a node that is affected by excessive pass-through rate and then share the output?

#!/bin/bash

LOG_DIR_NAME="opencas_ioclass_stats/$(date '+%H:%M:%S')"
SCRIPT_NAME=$(basename "$0")

if [ "$#" -ne 1 ]; then
        echo "Usage: '$SCRIPT_NAME <cache_id>'"
        exit 1
fi

CACHE_ID=$1

casadm -P -i $CACHE_ID &> /dev/null
if [ $? -ne 0 ]; then
        echo "Cache $CACHE_ID doesn't exist!"
        exit 1
fi

mkdir -p $LOG_DIR_NAME

while [ 1 ]
do
        hr=$(date '+%H:%M:%S')
        echo $hr
        echo $hr >> $LOG_DIR_NAME/time
        casadm -P -i $CACHE_ID -o csv | tail -n 1 >> $LOG_DIR_NAME/cache_stats.csv

        for ioclass_id in $(seq 0 32);
        do
                stats="$(casadm -P -i 1 -d $ioclass_id -o csv 2> /dev/null | tail -n 1  )"
                if [ ! "$stats" ]; then
                        continue
                fi

                echo "$stats" >> $LOG_DIR_NAME/ioclass_${ioclass_id}_stats.csv
        done

        sleep 10
done

aneagoe commented 3 weeks ago

I will run this in production cluster node against v24.09 (release). It's hard to say for sure if the problem was present on CentOS8, but I'm inclined to say yes. I will try to dig up some data from Prometheus and see if I can prove that.

aneagoe commented 3 weeks ago

It looks like the issue doesn't manifest itself quite like before. I've collected stats from one of the nodes, where the serviced requests rate fluctuates between 60% and 95%+. Some dashboard stats below: Collected stats can be found here: opencas_stats.tar.gz

It's quite hard to draw conclusions now. The prod servers that were most severely affected by this, I have upgraded to latest release version (v24.09). Also, if I go back in time for this particular host/cache instance, it looks like before 17th of October it used to have consistently low rates of serviced requests. A few more screenshots below showing also OS and OpenCAS version timelines. On 17th of October different settings were made to ioclass (basically remove and leave single class), which led to 100% serviced requests (though this also resulted in the cache never being cleaned properly). This was kept only temporarily. In short:

09.08.2024 - 06.09.2024: opencas version 22.06.2.0723-release, CentOS8
06.09.2024 - 17.10.2024: opencas version 24.09.0864-master, CentOS9
17.10.2024 - onward: opencas version 24.09-release, CentOS9

During cluster upgrades and when the issue manifested itself the most, we've gone through massive rebalances in the cluster, as each server was upgraded from CentOS8 to CentOS9. Perhaps you can still find some clues in the stats I've attached, and maybe some hints as per why we have these regular drops to 60%. Also interesting why this metric switches to 100% the moment we change ioclass and leave just unclassified, like the following:

+=============+===============+===================+============+
| IO class id | IO class name | Eviction priority | Allocation |
+=============+===============+===================+============+
|           0 | unclassified  |                 1 |       1.00 |
+=============+===============+===================+============+

Open-CAS / open-cas-linux