casctl stop --flush leaves data in cache?

jvinolas commented 2 weeks ago

Description

Following documentation for start and stop I did what I show in steps to reproduce and activating the cache device again after a casctl stop --flush throws error saying there is data in the cache that needs to be replayed.

I'm activating/deactivating opencas manually, so nothing set in /etc/opencas/opencas.conf. What I'm doing wrong?

Expected Behavior

No data should be found in cache after a casctl stop --flush command triggered.

Actual Behavior

Data found

Steps to Reproduce

root@nfs7:/mnt# /usr/sbin/casadm -S -i 1 -d "$CACHE_DEVICE" -c wb --force
Successfully added cache instance 1
root@nfs7:/mnt# casadm -L
type    id   disk             status    write policy   device
cache   1    /dev/nvme0n1p1   Running   wb             -
root@nfs7:/mnt# /usr/sbin/casadm -A -i 1 -d "$CORE_DEVICE"
Successfully added core 1 to cache instance 1
root@nfs7:/mnt# casadm -L
type    id   disk             status    write policy   device
cache   1    /dev/nvme0n1p1   Running   wb             -
└core   1    /dev/dm-4        Active    -              /dev/cas1-1
root@nfs7:/mnt# casctl stop --flush
root@nfs7:/mnt# casadm -L
No caches running
root@nfs7:/mnt# /usr/sbin/casadm -S -i 1 -d "$CACHE_DEVICE" -c wb
Error inserting cache 1
Old metadata found on device.
Please load cache metadata using --load option or use --force to
 discard on-disk metadata and start fresh cache instance.

Context

Activating/Deactivating cache

Possible Fix

-

Logs

-

Configuration files

-

Your Environment

OpenCAS version (commit hash or tag): aafc6b49a6c4e4912d4dc3db0fe212a42d8f7e0b
Operating System: Ubuntu 24.04
Kernel version: 6.9.12-060912-generic #202408281052
Cache device type (NAND/Optane/other): nvme
Core device type (HDD/SSD/other): raid0 with dm-vdo over it
Cache configuration: casadm -S -i 1 -d /dev/disk/by-id/nvme-SAMSUNG_MZWLR3T8HBLS-00AU3_S64VNE0NC03689-part1 --cache-line-size 64 -c wb --force && casadm --set-param --name cleaning-alru -i 1 --wake-up 0 --staleness-time 1 --flush-max-buffers 10000 --activity-threshold 0
- Cache mode: wb
- Cache line size: 64
- Promotion policy: (default: always)
- Cleaning policy: (default: alru)
- Sequential cutoff policy: (default: full)
Other (e.g. lsblk, casadm -P, casadm -L)

sdb                         8:16   0    8T  0 disk  
└─sdb1                      8:17   0    8T  0 part  
  └─md100                   9:100  0   32T  0 raid0 
    └─vg_vdo0-lv0_vdata   252:0    0   31T  0 lvm   
      └─vg_vdo0-lv0-vpool 252:1    0  500T  0 lvm   
        └─vg_vdo0-vdo0    252:2    0  500T  0 lvm   
          └─cas1-1        251:0    0  500T  0 disk  
sdc                         8:32   0    8T  0 disk  
└─sdc1                      8:33   0    8T  0 part  
  └─md100                   9:100  0   32T  0 raid0 
    └─vg_vdo0-lv0_vdata   252:0    0   31T  0 lvm   
      └─vg_vdo0-lv0-vpool 252:1    0  500T  0 lvm   
        └─vg_vdo0-vdo0    252:2    0  500T  0 lvm   
          └─cas1-1        251:0    0  500T  0 disk  
sdd                         8:48   0    8T  0 disk  
└─sdd1                      8:49   0    8T  0 part  
  └─md100                   9:100  0   32T  0 raid0 
    └─vg_vdo0-lv0_vdata   252:0    0   31T  0 lvm   
      └─vg_vdo0-lv0-vpool 252:1    0  500T  0 lvm   
        └─vg_vdo0-vdo0    252:2    0  500T  0 lvm   
          └─cas1-1        251:0    0  500T  0 disk  
sde                         8:64   0    8T  0 disk  
└─sde1                      8:65   0    8T  0 part  
  └─md100                   9:100  0   32T  0 raid0 
    └─vg_vdo0-lv0_vdata   252:0    0   31T  0 lvm   
      └─vg_vdo0-lv0-vpool 252:1    0  500T  0 lvm   
        └─vg_vdo0-vdo0    252:2    0  500T  0 lvm   
          └─cas1-1        251:0    0  500T  0 disk

type    id   disk             status    write policy   device
cache   1    /dev/nvme0n1p1   Running   wb             -
└core   1    /dev/dm-4        Active    -              /dev/cas1-1

robertbaldyga commented 2 weeks ago

Hi @jvinolas! You did everything right. The data should be flushed to the backend device. There is no dirty data on the cache. However after stopping the cache, the cache device still contains metadata and clean data, so after loading it, it could already start in a warm state. If you want to start the cache entirely from scratch (because, for example, you have written something to the backend device while the cache was stopped, effectively rendering the cached data outdated), you can either do it adding --force to the cache start command, instructing it to ignore the old metadata, or by calling casadm --zero-metadata, to erase the metadata from the cache device before starting a new instance.

jvinolas commented 2 weeks ago

So @robertbaldyga , either if the wb cache is unclean (not flushed to core device because server hung for example) or if it was correctly flushed with casctl stop --flush I understand the --load to activate again the cache should be safe and it seems the recomended method to do it, right? (taking into account that no access has been done to the core device meanwhile, of course).

robertbaldyga commented 2 weeks ago

Yes, that's correct.

jvinolas commented 2 weeks ago

thanks!

Open-CAS / open-cas-linux