Open aneagoe opened 1 month ago
Hi @aneagoe! Could please you send output of casadm -P -i 1 -d
?
I've switched all nodes to a single ioclass (unclassified) and the serviced request ratio went to 95%+. Disabling sequential cutoff achieved 100% (just experimental, doesn't make sense to leave on). This is consistent across all nodes in the dev cluster and I've done the same to one node in prod cluster with same results. So there seems to be something off either with my settings of the ioclass or with priority based evictions. Althought I couldn't run the precise command you sent, I dug up a few commands from when I was troubleshooting:
[root@acosd3 ~]# casadm -P -i 1 -d 0
IO class ID 0
IO class name unclassified
Eviction priority 100
Max size 0%
╔══════════════════╤═══════╤═════╤═════════════╗
║ Usage statistics │ Count │ % │ Units ║
╠══════════════════╪═══════╪═════╪═════════════╣
║ Occupancy │ 0 │ 0.0 │ 4KiB Blocks ║
║ Clean │ 0 │ 0.0 │ 4KiB Blocks ║
║ Dirty │ 0 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧═══════╧═════╧═════════════╝
╔══════════════════════╤═══════╤═════╤══════════╗
║ Request statistics │ Count │ % │ Units ║
╠══════════════════════╪═══════╪═════╪══════════╣
║ Read hits │ 0 │ 0.0 │ Requests ║
║ Read partial misses │ 0 │ 0.0 │ Requests ║
║ Read full misses │ 0 │ 0.0 │ Requests ║
║ Read total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼─────┼──────────╢
║ Write hits │ 0 │ 0.0 │ Requests ║
║ Write partial misses │ 0 │ 0.0 │ Requests ║
║ Write full misses │ 0 │ 0.0 │ Requests ║
║ Write total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼─────┼──────────╢
║ Pass-Through reads │ 0 │ 0.0 │ Requests ║
║ Pass-Through writes │ 0 │ 0.0 │ Requests ║
║ Serviced requests │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼─────┼──────────╢
║ Total requests │ 0 │ 0.0 │ Requests ║
╚══════════════════════╧═══════╧═════╧══════════╝
╔══════════════════════════════════╤═══════╤═════╤═════════════╗
║ Block statistics │ Count │ % │ Units ║
╠══════════════════════════════════╪═══════╪═════╪═════════════╣
║ Reads from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼─────┼─────────────╢
║ Reads from cache │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to cache │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from cache │ 0 │ 0.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼─────┼─────────────╢
║ Reads from exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 1
IO class ID 1
IO class name direct&done
Eviction priority 50
Max size 100%
╔══════════════════╤═══════╤═══════╤═════════════╗
║ Usage statistics │ Count │ % │ Units ║
╠══════════════════╪═══════╪═══════╪═════════════╣
║ Occupancy │ 748 │ 0.0 │ 4KiB Blocks ║
║ Clean │ 748 │ 100.0 │ 4KiB Blocks ║
║ Dirty │ 0 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧═══════╧═══════╧═════════════╝
╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics │ Count │ % │ Units ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits │ 5 │ 100.0 │ Requests ║
║ Read partial misses │ 0 │ 0.0 │ Requests ║
║ Read full misses │ 0 │ 0.0 │ Requests ║
║ Read total │ 5 │ 100.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits │ 0 │ 0.0 │ Requests ║
║ Write partial misses │ 0 │ 0.0 │ Requests ║
║ Write full misses │ 0 │ 0.0 │ Requests ║
║ Write total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads │ 0 │ 0.0 │ Requests ║
║ Pass-Through writes │ 0 │ 0.0 │ Requests ║
║ Serviced requests │ 5 │ 100.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests │ 5 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝
╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics │ Count │ % │ Units ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache │ 5 │ 100.0 │ 4KiB Blocks ║
║ Writes to cache │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from cache │ 5 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s) │ 5 │ 100.0 │ 4KiB Blocks ║
║ Writes to exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 5 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 10
IO class ID 10
IO class name request_size:le:4096&done
Eviction priority 1
Max size 100%
╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count │ % │ Units ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy │ 407540 │ 3.6 │ 4KiB Blocks ║
║ Clean │ 407536 │ 100.0 │ 4KiB Blocks ║
║ Dirty │ 4 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝
╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics │ Count │ % │ Units ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits │ 0 │ 0.0 │ Requests ║
║ Read partial misses │ 0 │ 0.0 │ Requests ║
║ Read full misses │ 0 │ 0.0 │ Requests ║
║ Read total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits │ 42 │ 30.9 │ Requests ║
║ Write partial misses │ 0 │ 0.0 │ Requests ║
║ Write full misses │ 8 │ 5.9 │ Requests ║
║ Write total │ 50 │ 36.8 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads │ 0 │ 0.0 │ Requests ║
║ Pass-Through writes │ 86 │ 63.2 │ Requests ║
║ Serviced requests │ 50 │ 36.8 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests │ 136 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝
╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics │ Count │ % │ Units ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to core(s) │ 179 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s) │ 179 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache │ 188 │ 79.0 │ 4KiB Blocks ║
║ Writes to cache │ 50 │ 21.0 │ 4KiB Blocks ║
║ Total to/from cache │ 238 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to exported object(s) │ 136 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 136 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 11
IO class ID 11
IO class name request_size:le:8192&done
Eviction priority 2
Max size 100%
╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count │ % │ Units ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy │ 190084 │ 1.7 │ 4KiB Blocks ║
║ Clean │ 190072 │ 100.0 │ 4KiB Blocks ║
║ Dirty │ 12 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝
╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics │ Count │ % │ Units ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits │ 0 │ 0.0 │ Requests ║
║ Read partial misses │ 0 │ 0.0 │ Requests ║
║ Read full misses │ 0 │ 0.0 │ Requests ║
║ Read total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits │ 36 │ 33.6 │ Requests ║
║ Write partial misses │ 0 │ 0.0 │ Requests ║
║ Write full misses │ 10 │ 9.3 │ Requests ║
║ Write total │ 46 │ 43.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads │ 0 │ 0.0 │ Requests ║
║ Pass-Through writes │ 61 │ 57.0 │ Requests ║
║ Serviced requests │ 46 │ 43.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests │ 107 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝
╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics │ Count │ % │ Units ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to core(s) │ 257 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s) │ 257 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache │ 216 │ 70.1 │ 4KiB Blocks ║
║ Writes to cache │ 92 │ 29.9 │ 4KiB Blocks ║
║ Total to/from cache │ 308 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to exported object(s) │ 214 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 214 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 12
IO class ID 12
IO class name request_size:le:16384&done
Eviction priority 3
Max size 100%
╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count │ % │ Units ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy │ 365160 │ 3.2 │ 4KiB Blocks ║
║ Clean │ 365160 │ 100.0 │ 4KiB Blocks ║
║ Dirty │ 0 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝
╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics │ Count │ % │ Units ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits │ 0 │ 0.0 │ Requests ║
║ Read partial misses │ 0 │ 0.0 │ Requests ║
║ Read full misses │ 0 │ 0.0 │ Requests ║
║ Read total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits │ 7 │ 7.0 │ Requests ║
║ Write partial misses │ 2 │ 2.0 │ Requests ║
║ Write full misses │ 3 │ 3.0 │ Requests ║
║ Write total │ 12 │ 12.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads │ 0 │ 0.0 │ Requests ║
║ Pass-Through writes │ 88 │ 88.0 │ Requests ║
║ Serviced requests │ 12 │ 12.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests │ 100 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝
╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics │ Count │ % │ Units ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to core(s) │ 391 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s) │ 391 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache │ 96 │ 68.6 │ 4KiB Blocks ║
║ Writes to cache │ 44 │ 31.4 │ 4KiB Blocks ║
║ Total to/from cache │ 140 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to exported object(s) │ 366 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 366 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
[root@acosd3 ~]# casadm -P -i 1 -d 13
IO class ID 13
IO class name request_size:le:32768&done
Eviction priority 4
Max size 100%
╔══════════════════╤════════╤═══════╤═════════════╗
║ Usage statistics │ Count │ % │ Units ║
╠══════════════════╪════════╪═══════╪═════════════╣
║ Occupancy │ 729812 │ 6.4 │ 4KiB Blocks ║
║ Clean │ 729812 │ 100.0 │ 4KiB Blocks ║
║ Dirty │ 0 │ 0.0 │ 4KiB Blocks ║
╚══════════════════╧════════╧═══════╧═════════════╝
╔══════════════════════╤═══════╤═══════╤══════════╗
║ Request statistics │ Count │ % │ Units ║
╠══════════════════════╪═══════╪═══════╪══════════╣
║ Read hits │ 0 │ 0.0 │ Requests ║
║ Read partial misses │ 0 │ 0.0 │ Requests ║
║ Read full misses │ 0 │ 0.0 │ Requests ║
║ Read total │ 0 │ 0.0 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Write hits │ 15 │ 13.2 │ Requests ║
║ Write partial misses │ 1 │ 0.9 │ Requests ║
║ Write full misses │ 4 │ 3.5 │ Requests ║
║ Write total │ 20 │ 17.5 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Pass-Through reads │ 0 │ 0.0 │ Requests ║
║ Pass-Through writes │ 94 │ 82.5 │ Requests ║
║ Serviced requests │ 20 │ 17.5 │ Requests ║
╟──────────────────────┼───────┼───────┼──────────╢
║ Total requests │ 114 │ 100.0 │ Requests ║
╚══════════════════════╧═══════╧═══════╧══════════╝
╔══════════════════════════════════╤═══════╤═══════╤═════════════╗
║ Block statistics │ Count │ % │ Units ║
╠══════════════════════════════════╪═══════╪═══════╪═════════════╣
║ Reads from core(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to core(s) │ 770 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s) │ 770 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from cache │ 176 │ 58.3 │ 4KiB Blocks ║
║ Writes to cache │ 126 │ 41.7 │ 4KiB Blocks ║
║ Total to/from cache │ 302 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼───────┼───────┼─────────────╢
║ Reads from exported object(s) │ 0 │ 0.0 │ 4KiB Blocks ║
║ Writes to exported object(s) │ 740 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │ 740 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧═══════╧═══════╧═════════════╝
Hi @aneagoe, thanks for the quick response. Could you please rerun your workload on a new cache instance with your original ioclass config and cleaning policy set to NOP (casadm -X -n cleaning -p nop
) and share full stats of all configured ioclasses?
@mmichal10 I can adjust on one of the nodes as recommended. But can you please clarify the scope of this experiment? In my understanding, if we switch to nop we're basically not cleaning the cache, so once it gets full, we can expect it to switch to write-through, or? I will stop the OSDs, stop and remove existing cas devices and re-create with the settings you mentioned, then report back.
@aneagoe once the cache is 100% dirty the write requests will be redirected into pass-through and flushing of a small portion of dirty blocks will be triggered. We want to make sure that cache gets 100% dirty before the writes are pass-through'ed and we want to see occupancy and how many writes got pass-through'ed for each ioclass
I've attached snapshots taken at 5 minute intervals. Triggered some pool resizes to get some more traffic going and it seems the cache never got to 100% dirty (but some 99.8%) before the flushing happens. Perhaps not very easy to go through, if you need different format/layout, I'm happy to re-run this experiment and accommodate, just let me know. cas_stats.out.gz
Hi @aneagoe, just a quick question for you: I'm trying to reproduce what you're seeing on our side and I am seeing some passthrough traffic, but not as much as in your stats. Can you share a bit more info on your testing setup? For me it's ((HDD->single SSD CAS cache)12osds)3nodes and on top of that I create 60 RBD images, fill the caches with large writes and run some read-intensive workload or rwmix (IO done by fio with librbd). Maybe if we'd align a bit more it would help us see exactly the same issues you're seeing.
Hey @jfckm. I have a dev cluster where I tend to test any changes before I apply to prod cluster. The dev cluster is built on top of PROXMOX with its own CEPH cluster. There are 3 OSD VMs, each with 6 x 1TB volumes and also 2 x NVME drive to be partly used by opencas (shared with OS, we assign a 43G LV to be used by opencas). From the same NVME drive, we also carve two LVs to be used as direct NVMe OSDs. On the NVMe backed storage we place things like rgw index bucket and cephfs_metadata pools. Some lsblk below to show the overall setup:
[root@acosd1 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 1000G 0 disk
└─cas2-2 252:1792 0 1000G 0 disk
└─ceph_data_0102-ceph_0102_block 253:23 0 1000G 0 lvm
└─VU36Ok-riz1-Tgs2-KxxC-UpIG-8Vxf-Fs1WrQ 253:29 0 1000G 0 crypt
sdb 8:16 0 1000G 0 disk
└─cas1-1 252:1536 0 1000G 0 disk
└─ceph_data_0101-ceph_0101_block 253:22 0 1000G 0 lvm
└─xiFIJE-Fkxf-8j2a-qhG3-NirX-bhdK-HGx5FQ 253:28 0 1000G 0 crypt
sdc 8:32 0 1000G 0 disk
└─cas1-5 252:2560 0 1000G 0 disk
└─ceph_data_0105-ceph_0105_block 253:26 0 1000G 0 lvm
└─tcWdld-26OF-zNli-Iehs-Zk37-fUXZ-2g1YR3 253:32 0 1000G 0 crypt
sdd 8:48 0 1000G 0 disk
└─cas1-3 252:2048 0 1000G 0 disk
└─ceph_data_0103-ceph_0103_block 253:24 0 1000G 0 lvm
└─EPHkdC-ucla-hoMx-qMP6-0Uno-r82O-VOc00n 253:30 0 1000G 0 crypt
sde 8:64 0 1000G 0 disk
└─cas2-4 252:2304 0 1000G 0 disk
└─ceph_data_0104-ceph_0104_block 253:25 0 1000G 0 lvm
└─xZWLVv-XCdf-j3kt-uCVE-sZB1-2q5B-B8NYRQ 253:31 0 1000G 0 crypt
sdf 8:80 0 1000G 0 disk
└─cas2-6 252:2816 0 1000G 0 disk
└─ceph_data_0106-ceph_0106_block 253:27 0 1000G 0 lvm
└─3B1DAI-EOX6-4gNe-DsPM-dmiF-SiTL-sizaLC 253:33 0 1000G 0 crypt
nvme0n1 259:0 0 265G 0 disk
├─nvme0n1p1 259:1 0 39.1G 0 part
│ └─md1 9:1 0 39G 0 raid1
│ ├─vg_root-root 253:0 0 6G 0 lvm /
│ ├─vg_root-home 253:1 0 4G 0 lvm /home
│ ├─vg_root-var 253:2 0 10G 0 lvm /var
│ └─vg_root-tmp 253:3 0 5G 0 lvm /tmp
├─nvme0n1p2 259:2 0 1G 0 part
│ └─md0 9:0 0 1022M 0 raid1 /boot
├─nvme0n1p3 259:3 0 200M 0 part
│ └─md127 9:127 0 199.9M 0 raid1 /boot/efi
└─nvme0n1p4 259:4 0 224.7G 0 part
├─vg_ceph_01_misc_1-ceph_0101_block.db 253:13 0 31.1G 0 lvm
├─vg_ceph_01_misc_1-ceph_0101_block.wal 253:14 0 8G 0 lvm
├─vg_ceph_01_misc_1-ceph_0103_block.db 253:15 0 31.1G 0 lvm
├─vg_ceph_01_misc_1-ceph_0103_block.wal 253:16 0 8G 0 lvm
├─vg_ceph_01_misc_1-ceph_0105_block.db 253:17 0 31.1G 0 lvm
├─vg_ceph_01_misc_1-ceph_0105_block.wal 253:18 0 8G 0 lvm
├─vg_ceph_01_misc_1-ceph_0130_block 253:19 0 32G 0 lvm
├─vg_ceph_01_misc_1-ceph_0132_block 253:20 0 32G 0 lvm
└─vg_ceph_01_misc_1-ceph_01_cas1 253:21 0 43.3G 0 lvm
nvme1n1 259:5 0 265G 0 disk
├─nvme1n1p1 259:6 0 39.1G 0 part
│ └─md1 9:1 0 39G 0 raid1
│ ├─vg_root-root 253:0 0 6G 0 lvm /
│ ├─vg_root-home 253:1 0 4G 0 lvm /home
│ ├─vg_root-var 253:2 0 10G 0 lvm /var
│ └─vg_root-tmp 253:3 0 5G 0 lvm /tmp
├─nvme1n1p2 259:7 0 1G 0 part
│ └─md0 9:0 0 1022M 0 raid1 /boot
├─nvme1n1p3 259:8 0 200M 0 part
│ └─md127 9:127 0 199.9M 0 raid1 /boot/efi
└─nvme1n1p4 259:9 0 224.7G 0 part
├─vg_ceph_01_misc_2-ceph_0102_block.db 253:4 0 31.1G 0 lvm
├─vg_ceph_01_misc_2-ceph_0102_block.wal 253:5 0 8G 0 lvm
├─vg_ceph_01_misc_2-ceph_0104_block.db 253:6 0 31.1G 0 lvm
├─vg_ceph_01_misc_2-ceph_0104_block.wal 253:7 0 8G 0 lvm
├─vg_ceph_01_misc_2-ceph_0106_block.db 253:8 0 31.1G 0 lvm
├─vg_ceph_01_misc_2-ceph_0106_block.wal 253:9 0 8G 0 lvm
├─vg_ceph_01_misc_2-ceph_0131_block 253:10 0 32G 0 lvm
├─vg_ceph_01_misc_2-ceph_0133_block 253:11 0 32G 0 lvm
└─vg_ceph_01_misc_2-ceph_01_cas2 253:12 0 43.3G 0 lvm
[root@acosd1 ~]# casadm -L
type id disk status write policy device
cache 1 /dev/dm-21 Running wb -
├core 1 /dev/sdb Active - /dev/cas1-1
├core 3 /dev/sdd Active - /dev/cas1-3
└core 5 /dev/sdc Active - /dev/cas1-5
cache 2 /dev/dm-12 Running wb -
├core 2 /dev/sda Active - /dev/cas2-2
├core 4 /dev/sde Active - /dev/cas2-4
└core 6 /dev/sdf Active - /dev/cas2-6
On top of it, we configure RBD (consumed by a dev k8s cluster), CEPHFS (some dev workloads) and object storage (also used by some dev workloads). The cluster is barely used as the workloads are minimal.
The behaviour was observed naturally (and also shows up on our prod cluster) and during last test I had to trigger some pool reorg (bumped up/down number of PGs for some pools).
OSD df below:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
101 hdd 0.97699 1.00000 1.0 TiB 109 GiB 78 GiB 6 KiB 851 MiB 922 GiB 10.56 0.94 62 up
102 hdd 0.97699 1.00000 1.0 TiB 116 GiB 85 GiB 10 KiB 1.4 GiB 915 GiB 11.23 1.00 66 up
103 hdd 0.97699 1.00000 1.0 TiB 111 GiB 80 GiB 5 KiB 2.2 GiB 920 GiB 10.75 0.96 60 up
104 hdd 0.97699 1.00000 1.0 TiB 120 GiB 89 GiB 8 KiB 1.9 GiB 911 GiB 11.64 1.04 71 up
105 hdd 0.97699 1.00000 1.0 TiB 120 GiB 89 GiB 8 KiB 1.9 GiB 911 GiB 11.66 1.04 63 up
106 hdd 0.97699 1.00000 1.0 TiB 124 GiB 93 GiB 15 KiB 1.3 GiB 907 GiB 12.05 1.07 65 up
201 hdd 0.97699 1.00000 1.0 TiB 109 GiB 77 GiB 6 KiB 886 MiB 923 GiB 10.53 0.94 61 up
202 hdd 0.97699 1.00000 1.0 TiB 125 GiB 93 GiB 15 KiB 893 MiB 907 GiB 12.08 1.07 65 up
203 hdd 0.97699 1.00000 1.0 TiB 114 GiB 83 GiB 8 KiB 1.7 GiB 917 GiB 11.07 0.99 66 up
204 hdd 0.97699 1.00000 1.0 TiB 110 GiB 79 GiB 7 KiB 1023 MiB 921 GiB 10.70 0.95 57 up
205 hdd 0.97699 1.00000 1.0 TiB 132 GiB 101 GiB 7 KiB 1.7 GiB 899 GiB 12.81 1.14 76 up
206 hdd 0.97699 1.00000 1.0 TiB 115 GiB 83 GiB 15 KiB 1.7 GiB 917 GiB 11.12 0.99 60 up
301 hdd 0.97699 1.00000 1.0 TiB 116 GiB 85 GiB 9 KiB 1.4 GiB 915 GiB 11.27 1.00 63 up
302 hdd 0.97699 1.00000 1.0 TiB 116 GiB 85 GiB 12 KiB 1.2 GiB 915 GiB 11.22 1.00 62 up
303 hdd 0.97699 1.00000 1.0 TiB 109 GiB 78 GiB 7 KiB 1.2 GiB 922 GiB 10.60 0.94 58 up
304 hdd 0.97699 1.00000 1.0 TiB 116 GiB 85 GiB 7 KiB 1.0 GiB 915 GiB 11.25 1.00 64 up
305 hdd 0.97699 1.00000 1.0 TiB 120 GiB 89 GiB 8 KiB 876 MiB 911 GiB 11.67 1.04 70 up
306 hdd 0.97699 1.00000 1.0 TiB 122 GiB 90 GiB 17 KiB 875 MiB 909 GiB 11.79 1.05 65 up
130 nvme 0.03099 1.00000 32 GiB 2.9 GiB 887 MiB 39 MiB 2.0 GiB 29 GiB 8.96 0.80 33 up
131 nvme 0.03099 1.00000 32 GiB 1.2 GiB 874 MiB 31 MiB 322 MiB 31 GiB 3.75 0.33 14 up
132 nvme 0.03099 1.00000 32 GiB 2.0 GiB 875 MiB 43 MiB 1.1 GiB 30 GiB 6.35 0.56 20 up
133 nvme 0.03099 1.00000 32 GiB 3.1 GiB 875 MiB 18 MiB 2.2 GiB 29 GiB 9.70 0.86 13 up
230 nvme 0.03099 1.00000 32 GiB 2.8 GiB 879 MiB 39 MiB 2.0 GiB 29 GiB 8.90 0.79 25 up
231 nvme 0.03099 1.00000 32 GiB 1.2 GiB 874 MiB 35 MiB 322 MiB 31 GiB 3.76 0.33 19 up
232 nvme 0.03099 1.00000 32 GiB 2.3 GiB 875 MiB 14 MiB 1.4 GiB 30 GiB 7.13 0.63 14 up
233 nvme 0.03099 1.00000 32 GiB 2.3 GiB 882 MiB 43 MiB 1.4 GiB 30 GiB 7.24 0.64 22 up
330 nvme 0.03099 1.00000 32 GiB 1.2 GiB 885 MiB 45 MiB 327 MiB 31 GiB 3.84 0.34 24 up
331 nvme 0.03099 1.00000 32 GiB 2.8 GiB 875 MiB 27 MiB 2.0 GiB 29 GiB 8.89 0.79 16 up
332 nvme 0.03099 1.00000 32 GiB 1.7 GiB 875 MiB 41 MiB 872 MiB 30 GiB 5.45 0.49 18 up
333 nvme 0.03099 1.00000 32 GiB 2.0 GiB 876 MiB 19 MiB 1.1 GiB 30 GiB 6.26 0.56 22 up
TOTAL 19 TiB 2.1 TiB 1.5 TiB 395 MiB 39 GiB 16 TiB 11.24
MIN/MAX VAR: 0.33/1.14 STDDEV: 3.20
I can definitely run a more aligned test by spinning up workloads in k8s pods that will consume RBDs. I can take similar approach of filling the cache and then running a more mixed workload of both read and write.
What I'm really trying to understand here is why am I seeing this large amount of PT writes when the cache is literally clean and it should just evict.
Thanks! This will be helpful for sure. For now the data you've provided should suffice, no need for additional experiments - it's me who wanted to reproduce your environment as closely as I can. We're taking a two-pronged approach here: @robertbaldyga and @mmichal10 are trying to come up with something with static code analysis and in the meantime I'm trying to reproduce it on live system. We will keep you posted.
Thanks a lot for you help. I'm now working on aligning all PROD nodes to the latest opencas (we observe the same behaviour on PROD cluster across various master releases). On the PROD cluster I had to use master branch and rebuild nodes one by one as we've switched to EL9 (CentOS stream9). We start to run into performance issues only when the cache gets full and unfortunately there's no way to force evict (I need to basically stop OSDs, stop and remove cache then force init the cache).
Hi @aneagoe,
You're experiencing the issue on Centos9 with OpenCAS v22.6.3 and v24.09, is that correct? Did the problem appear on Centos8?
We tried to reproduce the problem but unfortunately the only PT requests we observed were due to cache being 100% dirty. I've prepared a script that collects cache stats in CVS format. Could you please run it for an hour or two on a node that is affected by excessive pass-through rate and then share the output?
#!/bin/bash
LOG_DIR_NAME="opencas_ioclass_stats/$(date '+%H:%M:%S')"
SCRIPT_NAME=$(basename "$0")
if [ "$#" -ne 1 ]; then
echo "Usage: '$SCRIPT_NAME <cache_id>'"
exit 1
fi
CACHE_ID=$1
casadm -P -i $CACHE_ID &> /dev/null
if [ $? -ne 0 ]; then
echo "Cache $CACHE_ID doesn't exist!"
exit 1
fi
mkdir -p $LOG_DIR_NAME
while [ 1 ]
do
hr=$(date '+%H:%M:%S')
echo $hr
echo $hr >> $LOG_DIR_NAME/time
casadm -P -i $CACHE_ID -o csv | tail -n 1 >> $LOG_DIR_NAME/cache_stats.csv
for ioclass_id in $(seq 0 32);
do
stats="$(casadm -P -i 1 -d $ioclass_id -o csv 2> /dev/null | tail -n 1 )"
if [ ! "$stats" ]; then
continue
fi
echo "$stats" >> $LOG_DIR_NAME/ioclass_${ioclass_id}_stats.csv
done
sleep 10
done
I will run this in production cluster node against v24.09 (release). It's hard to say for sure if the problem was present on CentOS8, but I'm inclined to say yes. I will try to dig up some data from Prometheus and see if I can prove that.
It looks like the issue doesn't manifest itself quite like before. I've collected stats from one of the nodes, where the serviced requests rate fluctuates between 60% and 95%+. Some dashboard stats below: Collected stats can be found here: opencas_stats.tar.gz
It's quite hard to draw conclusions now. The prod servers that were most severely affected by this, I have upgraded to latest release version (v24.09). Also, if I go back in time for this particular host/cache instance, it looks like before 17th of October it used to have consistently low rates of serviced requests. A few more screenshots below showing also OS and OpenCAS version timelines. On 17th of October different settings were made to ioclass (basically remove and leave single class), which led to 100% serviced requests (though this also resulted in the cache never being cleaned properly). This was kept only temporarily. In short:
22.06.2.0723-release
, CentOS824.09.0864-master
, CentOS924.09-release
, CentOS9During cluster upgrades and when the issue manifested itself the most, we've gone through massive rebalances in the cluster, as each server was upgraded from CentOS8 to CentOS9. Perhaps you can still find some clues in the stats I've attached, and maybe some hints as per why we have these regular drops to 60%. Also interesting why this metric switches to 100% the moment we change ioclass and leave just unclassified, like the following:
+=============+===============+===================+============+
| IO class id | IO class name | Eviction priority | Allocation |
+=============+===============+===================+============+
| 0 | unclassified | 1 | 1.00 |
+=============+===============+===================+============+
Question
On a dev ceph cluster, I'm using opencas in front of the HDDs. Went through some upgrades (centos8 to centos9, upgrade to latest opencas 24.09.0.0900.release) and noticed that the rate if serviced requests is extremely low (ie below 10%). This seems to happen when the cache gets full, despite the fact that the cache is clean. With the current settings in place, I cannot explain why the requests are not serviced and go directly to the core. The issue is very similar to https://github.com/Open-CAS/open-cas-linux/issues/1397 which for some reason became stale.
Note that in the output below (from one of the servers), the seq-cutoff has been configured to never (set to full on the other nodes) in an attempt to see if it would change something. It had no effect unfortunately.
Motivation
I need to improve performance and then replicate to our prod cluster. Performance is now degraded as most operations skip the cache entirely.
Your Environment