safeguard against reuse of active caching device

onlyjob commented 8 years ago

I've managed to wreck dm-writeboost module by accidentally re-using active caching device by constructing another dm-writeboosted device using the same caching device.

dm-writeboost went into endless loop constantly reading from caching device:

Jul 16 03:09:28 deblab kernel: [1901555.620293] dm-4: rw=1, want=2233227040, limit=1749020672
Jul 16 03:09:28 deblab kernel: [1901555.620294] attempt to access beyond end of device
Jul 16 03:09:28 deblab kernel: [1901555.620294] dm-4: rw=1, want=2233227816, limit=1749020672
Jul 16 03:09:28 deblab kernel: [1901555.620295] attempt to access beyond end of device
Jul 16 03:09:28 deblab kernel: [1901555.620296] dm-4: rw=1, want=2233227824, limit=1749020672
Jul 16 03:09:28 deblab kernel: [1901555.620296] attempt to access beyond end of device
Jul 16 03:09:28 deblab kernel: [1901555.620297] dm-4: rw=1, want=2233227832, limit=1749020672
Jul 16 03:09:28 deblab kernel: [1901555.620298] attempt to access beyond end of device
Jul 16 03:09:28 deblab kernel: [1901555.620299] dm-4: rw=1, want=2233227840, limit=1749020672
Jul 16 03:09:28 deblab kernel: [1901555.620299] attempt to access beyond end of device
Jul 16 03:09:28 deblab kernel: [1901555.620300] dm-4: rw=1, want=2233227944, limit=1749020672
Jul 16 03:09:28 deblab kernel: [1901555.620301] attempt to access beyond end of device
Jul 16 03:09:28 deblab kernel: [1901555.620301] dm-4: rw=1, want=2233227952, limit=1749020672

I did not find the way how to interrupt it so I had to resort to emergency reboot.

Please consider introducing safeguard to prevent accidental re-use of caching device.

akiradeveloper commented 8 years ago

@onlyjob Sharing a caching device with two dm-writeboost'd devices is invalid.

I once considered this by embedding some unique identifier of the backing device into the super block of the caching device but concluded there is no perfect solution for this.

The first reason is such perfect identifier doesn't exist (neither device name nor uuid are valid).

And the second reason is we should let the userland tools (e.g. your writeboost) to manage invalid operations. There are lot of invalid operations that is theoretically possible but managing them isn't a role of kernel module but the userland tools.

However, looping forever and you resort to rebooting bit sounds to be fixed. So I want to ask you for more details. I couldn't see what you actually did

Give me lsblk
What does "active" mean?
The first wb'd device and the second one are backed by a device of difference sizes?
You created the second wb'd device before removing the first wb'd device? or after?

akiradeveloper commented 8 years ago

@onlyjob

I wrote a test to reproduce the error message you saw in dmesg

class REPRO_115 extends DMTestSuite {
  test("endless loop when try to use active caching device") {
    Memory(Sector.M(8)) { caching => // 1
      Memory(Sector.M(128)) { backing1 => // 2
        Writeboost.sweepCaches(caching) // 3
        Writeboost.Table(backing1, caching).create { s => // 4
          Shell(s"dd if=/dev/urandom of=${s.bdev.path} bs=1M count=128") // write // 5
        }
      } // 6 (when moving out of block, device is removed)
      Memory(Sector.M(64)) { backing2 => // 7
        Writeboost.Table(backing2, caching).create { s => // 8
          Shell(s"dd if=${s.bdev.path} of=/dev/null bs=1M count=128") // read // 9
        }
      }
    }
  }
}

[ 3478.223470] attempt to access beyond end of device
[ 3478.223471] loop1: rw=1, want=254400, limit=131072
[ 3478.223497] attempt to access beyond end of device
[ 3478.223498] loop1: rw=1, want=254408, limit=131072
[ 3478.223525] attempt to access beyond end of device
[ 3478.223526] loop1: rw=1, want=254416, limit=131072
[ 3478.223552] attempt to access beyond end of device
[ 3478.223553] loop1: rw=1, want=254424, limit=131072
[ 3478.223580] attempt to access beyond end of device
[ 3478.223581] loop1: rw=1, want=254432, limit=131072
[ 3478.223607] attempt to access beyond end of device
[ 3478.223609] loop1: rw=1, want=254440, limit=131072
[ 3478.223635] attempt to access beyond end of device
[ 3478.223636] loop1: rw=1, want=254448, limit=131072
[ 3478.223663] attempt to access beyond end of device
[ 3478.223664] loop1: rw=1, want=254456, limit=131072

the test code says:

create a 8MB caching
create a 128MB backing1
zero out the 1st sector of caching
create wb'd device s
dd out the s
remove s
create 64M backing2 (smaller than backing1)
create wb'd device s without zeroing out
read out s (but this is irrelevant)

the number is corresponding to the comments in the test code so you can understand the code. (not difficult isn't it?)

the dmesg comes from writeback thread because the dirty cache's destination address is far beyond 64MB (it's around 120MB)

onlyjob commented 8 years ago

Give me lsblk

I believe it would be irrelevant. Too many disks, etc.

What does "active" mean?

"Active" means in use.

I created one dm-writeboosted device (HDD1+SSD1) then I created another dm-writeboosted device (HDD2+SSD1) where caching device SSD1 was mistakenly re-used. Second writeboosted device was constructed when first one was already in use.

The first wb'd device and the second one are backed by a device of difference sizes?

Probably same size. IMHO irrelevant as you can't reliably use size to detect whether device is used.

You created the second wb'd device before removing the first wb'd device? or after?

Actively used dm-writeboost device was not stopped so yes, second device was created before stopping first one.

I'm not concerned about embedding unique ID to caching devices. Here what's important is to check whether caching device is already allocated to another device before constructing new dm-writeboosted device. Only run-time check to prevent deadlock and potential data loss. Thanks.

akiradeveloper commented 8 years ago

I wrote a test for the case but there is no error.

  test("making another device while in use") {
    Memory(Sector.M(8)) { caching =>
      Memory(Sector.M(64)) { backing1 =>
        Memory(Sector.M(64)) { backing2 =>
          Writeboost.sweepCaches(caching)
          Writeboost.Table(backing1, caching).create { s =>
            Writeboost.Table(backing2, caching).create { s =>
            }
          }
        }
      }
    }
  }

Here what's important is to check whether caching device is already allocated to another device before constructing new dm-writeboosted device. Only run-time check to prevent deadlock and potential data loss. Thanks.

dm_dev has reference count internally but target code can't access it. I don't like to have my own data structure to manage that.

onlyjob commented 8 years ago

Well, I don't understand how your test work but I wrecked my system once when I improperly constructed writeboosted device by accidentally reusing caching disk of another writeboosted device that was already in use. It was too easy to make such mistake. Safeguard is important to implement for safety. Similar reasons why system prevents attempts to format device with mounted file system -- fool proof... May protect from accidents...

akiradeveloper commented 8 years ago

If we maintain a list of pair (backing, caching) where both values are simply the name passed that's only effective between two reboots (or hot-swapping is another issue this can't avoid the problem described here)

I will look deeper the dm code if we can use the internally managed reference count.

akiradeveloper commented 8 years ago

Let's try adding FMODE_EXCL flag and see what changes. There is no target adding this flag though.

static int consume_essential_argv(struct wb_device *wb, struct dm_arg_set *as)
{
    int err = 0;
    struct dm_target *ti = wb->ti;

    err = dm_get_device(ti, dm_shift_arg(as), dm_table_get_mode(ti->table),
                &wb->backing_dev);
    if (err) {
        DMERR("Failed to get backing_dev");
        return err;
    }

    err = dm_get_device(ti, dm_shift_arg(as), dm_table_get_mode(ti->table),
                &wb->cache_dev);

akiradeveloper commented 8 years ago

By default the mode doesn't have FMODE_EXCL in.

static inline fmode_t get_mode(struct dm_ioctl *param)
{
        fmode_t mode = FMODE_READ | FMODE_WRITE;

        if (param->flags & DM_READONLY_FLAG)
                mode = FMODE_READ;

        return mode;
}

it's either WRITE | READ (normal) or READ (read only)

akiradeveloper commented 8 years ago

The idea of adding FMODE_EXCL alone doesn't work

akiradeveloper / dm-writeboost

safeguard against reuse of active caching device #115